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ABSTRACT 


This  dissertation  investigates  several  main  challenges  to  implementing  hardware- 
based  wireless  fading  channel  emulators  with  emphasis  on  incorporating  accurate 
correlation  properties.  Multiple-input  multiple-output  (MIMO)  fading  channels  are 
usually  triply-selective  with  three  types  of  correlation:  temporal  correlation,  inter-tap 
correlation,  and  spatial  correlation.  The  proposed  emulators  implement  the  triply- 
selective  fading  Channel  Impulse  Response  (CIR)  by  incorporating  the  three  types 
of  correlation  into  multiple  uncorrelated  frequency-flat  Rayleigh  fading  waveforms 
while  meeting  real-time  requirements  for  high  data-rate,  large-sized  MIMO,  and/or 
long  CIR  channels.  Specifically,  mixed  parallel-serial  computational  structures  are 
implemented  for  Kronecker  products  of  the  correlation  matrices,  which  makes  the 
best  tradeoff  between  computational  speed  and  hardware  usage.  Five  practical  fad¬ 
ing  channel  examples  are  implemented  for  RF  or  underwater  acoustic  MIMO  ap¬ 
plications.  The  performance  of  the  hardware  emulators  are  verified  with  an  Altera 
Field-Programmable  Gate  Array  (FPGA)  platform  and  the  results  match  the  software 
simulators  in  terms  of  statistical  and  correlation  properties. 

The  dissertation  also  contributes  to  the  development  of  a  2-by-2  MIMO  transc¬ 
eiver  testbench  that  is  used  to  measure  real-world  fading  channels.  Intensive  chan¬ 
nel  measurements  are  performed  for  indoor  fixed  mobile-to-mobile  channels  and  the 
estimated  CIRs  demonstrate  the  triply-selective  correlation  properties. 
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1  INTRODUCTION 


1.1  BACKGROUND 

Wireless  fading  channel  modeling,  along  with  its  software  simulation  and  hard¬ 
ware  emulation,  is  an  important  topic  in  wireless  communications,  because  it  can 
provide  the  basis  for  verification  of  new  algorithm  design,  testing  of  transceiver  per¬ 
formance,  and  analysis  of  channel  capacity.  Comparing  to  held  tests,  wireless  fading 
channel  simulation  and  emulation  are  more  cost-effective,  time-efficient,  and  can  pro¬ 
vide  repeatable  and  reproducible  results.  Therefore,  it  has  been  widely  adopted  in 
academic  and  industry. 

Two  types  of  channel  modeling  approaches  are  usually  employed:  site-specific 
wave  propagation  approach  and  statistical  channel  modeling  approach.  The  site- 
specific  wave  propagation  approach  uses  Maxwell  equations  to  simulate  wave  propa¬ 
gation  through  the  communication  media  and  it  requires  detailed  physical  environ¬ 
ment,  geometry  and  dielectric  properties  [1].  This  approach  can  provide  channel 
models  for  specific  sites  but  requires  high  computational  power  for  simulation.  In 
contrast,  the  statistical  channel  modeling  approach  ignores  the  physical  details  and 
only  generates  the  fading  channel  impulse  responses  (CIR)  with  accurate  statistical 
properties  [2,3,4],  Well-designed  statistical  models  can  model  real-world  fading  chan¬ 
nels  matching  realistic  statistical  properties  such  as  probability  distribution  function 
(PDF),  power  delay  profiles,  and  auto/cross-correlation.  Its  computational  complex¬ 
ity  is  much  smaller  than  the  wave  propagation  approach,  thus  gaining  wide  spread 
use  in  the  last  two  decades. 

This  dissertation  takes  the  statistical  modeling  approach  to  multiple-input 
multiple-output  (MIMO)  fading  channel  modeling  and  focuses  on  hardware  emula¬ 
tion.  The  input-output  relationship  of  a  MIMO  fading  channel  in  the  discrete-time 
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domain  can  be  modeled  by  using  a  finite  impulse  response  (FIR)  filter  method.  The 
math  model  can  be  described  as  follows: 


L2 

y (fc)  =  X]  Hi(k)  ■  x(k  -  l) +  v(k) 

l=—L\ 


(1.1) 


Where  x(fc)  =  [xi(k),x2(k),  ...,xN(k)]t,  y  (k)  =  [yi(k),  y2(k), ...,  yM(k)Y  ,  and  v(fc)  = 
[vi(k),V2(k)i  ...,VM(k)Y  are  the  input  vector,  output  vector,  and  noise  vector  at  time 
instant  k,  respectively.  The  parameters  M,  N,  L\ ,  and  L2  are  the  numbers  of  receiver 
(Rx)  elements,  transmitter  (Tx)  elements,  low  and  high  indices  of  channel  taps  per 
sub-channels,  respectively.  The  matrix  Hz(/c)  is  the  channel  matrix  at  time  instant  k 
and  delay  tap  /,  defined  by 


H,(fc) 


/ 


hi,i{k,l) 


hi,N(k,  l ) 


\ 


y  hM,i(k,l)  ■■■  h,M,N(k,l )  J 


(1.2) 


Where  hm:n(k,l)  is  the  instantaneous  channel  coefficient  at  time  instant  k,  and  delay 
tap  /  for  the  sub-channel  of  the  m-th  R,x  element  and  n-tli  Tx  element.  We  assume 
the  symbol  duration  is  denoted  as  Ts.  For  the  convenience  of  hardware  emulation, 
the  matrix  H i(k)  is  reshaped  to  a  MIMO  channel  coefficient  vector  hvec(/c)  as  follows: 


Kec(k)  =  [hltl(k),...,hhN(k)  |  ...  |  hM,i(k), ...,  hM)JV(fc)]t 

h m,n{k')  \h'm,n{ki  Ti),  hmn(^k,  -C/2 ) ]  (1.3) 


A  practical  MIMO  fading  channel  exhibits  three  types  of  correlation.  Spa¬ 
tial  correlation  models  space-selectivity;  temporal  correlation  describes  the  time- 
selectivity;  inter-tap  correlation  exhibits  frequency-selectivity.  Therefore,  this  is  re¬ 
ferred  to  as  the  MIMO  triply-selective  fading  channel  [2],  Spatial  correlation  is  usually 
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measured  and  predefined  according  to  properties  of  multiple  antennas,  and  denoted 
as  the  Rx  correlation  matrix  vEr/-te  and  Tx  correlation  matrix  \I 'tx-  The  inter-tap  cor¬ 
relation  is  denoted  as  the  inter-tap  correlation  matrix,  C/57.  With  these  correlation 
matrices,  the  MIMO  channel  coefficient  vector  can  be  computed  as  follows: 

h„,c(*0  =  cf  (0)  ■  #(*;)  =  (*L  ®  n,  ®  c|s/)  ■  #(*)  (1.4) 

Where  the  vector  =  [Zi(k), ...,  Zmnl{^)Y  consists  of  CIRs  of  multiple  frequency- 
flat  fading  sub-channels  at  time  instant  k.  The  operator  “<g>”  and  “(-)5”  are  the 
Kronecker  product  and  square  root  of  a  matrix,  respectively.  More  details  about  this 
equation  can  be  found  in  [2], 

The  CIRs  of  the  i-th  frequency- flat  fading  sub-channel,  Zt(k),  exhibit  temporal 
correlation,  and  can  be  generated  by  flat  Rayleigh  fading  generators  (FRFG)  using  a 
sum  of  sinusoids  (SoS)  method  in  [3].  The  SoS  method  is  described  by  the  following 
equations: 


Zi(k) 

Za(k) 

Z,,(k) 


a 


m,i 


za(k )  +jZSi(k), 
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—  ^  cos(2tt (fdkTs  cos  am^  + 
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m=  1 
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^  cos(2ir (fdkTs  sin  +  (pm,i)), 


m=  1 


7T  (m  —  0.5  +  Of) 
2 M 


,  m  =  1,  2,  •  •  •  ,  M. 


(1.5) 


Where  Zfk)  is  the  complex  CIR  of  frequency- flat  fading  channel  at  time  instant  k,  fd 
is  the  maximum  Doppler  frequency,  M  is  the  total  number  of  sinusoids,  and  j  =  y/—l. 
The  angle  of  arrival  am>i  is  randomized  by  the  variable  0t.  The  random  variables 
c t>mti  and  (fmd  are  the  random  phases  of  the  in-phase  and  quadrature  components, 
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respectively.  The  random  variables  and  0*  are  statistically  independent 

and  uniformly  distributed  on  [—0.5,  0.5]. 

The  inter-tap  correlation  matrix  C/s/  is  related  to  channel  power  delay  profiles 
(PDP),  transceiver  shaping/matching  filters  and  symbol  rates.  According  to  [2],  the 
coefficient  of  C/s/,  c(Zi ,  Z2) ,  can  be  computed  by  the  following  equations: 

I< 

c(h,  1 2)  =  Y  atRPTpJMTs  ~  ri)R*pTpR(hTs  -  Tj) 

i= 1 
K 

G(t)  =  Yai6(T  ~  T*)  (L6) 

i= 1 

Where  G(t)  is  the  power  delay  profile,  K  is  the  number  of  total  resolvable  paths  and 
of  is  the  power  of  the  i-th  path  with  delay  t*,  RpTpR(e )  is  the  convolution  function 
of  the  Tx  shaping  filter  and  Rx  matching  filter. 

Ignoring  correlation  properties  result  in  inaccurate  MIMO  channel  models, 
lead  to  incorrect  CIRs,  and  may  cause  failure  of  transceiver  design.  For  example, 
temporal  correlation,  caused  by  Doppler  shift,  can  affect  the  selection  of  coding  length 
and  receiver  performance.  Spatial  correlation,  caused  by  insufficient  spacing  between 
antenna  elements,  can  reduce  channel  capacity  drastically.  Inter-tap  correlation  can 
affect  channel  gains  at  different  frequencies,  thus  influence  equalizer  design,  adaptive 
modulation  selection,  and  multi-user  channelization,  etc.  Therefore,  considering  all 
three  types  of  correlation  is  the  key,  as  well  as  difficult,  aspect  of  accurate  MIMO 
channel  modeling. 

Several  software-based  fading  channel  simulators  have  been  developed  to  gener¬ 
ate  CIRs  of  MIMO  fading  channels  employing  general-purpose  processors  and  floating¬ 
point  algorithms.  A  discrete-time  MIMO  triply-selcctive  fading  channel  simulator  has 
been  proposed  in  [2],  It  computes  the  inter-tap  correlation  matrix  according  to  the 
power  delay  profile,  and  then  incorporates  the  inter-tap  and  spatial  correlation  ma¬ 
trices  into  multiple  uncorrelated  frequency-flat  fading  CIRs  via  Kronecker  product. 
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Another  simulator,  proposed  in  [4] ,  synthesizes  correlated  vector  channels  (with  user- 
specified  correlation  function)  using  the  auto-regression  modeling  method  to  shape  the 
spectrum  of  uncorrelated  white  Gaussian  processes.  Software-based  fading  channel 
simulators  are  widely  used  for  research  algorithm  testing  in  non  real-time  settings. 

For  real-time  testing  and  instrumentation,  however,  hardware-based  wireless 
fading  channel  emulators  are  required  to  generate  analog  fading  waveforms  or  digital 
CIRs  in  real-time.  Implementing  hardware-based  emulators  for  MIMO  fading  chan¬ 
nels  with  accurate  correlation  properties  is  more  difficult  than  software  simulation 
due  to  timing  and  resource  constraints.  Existing  hardware-based  channel  emulators 
in  commercial  products  or  academic  papers  often  simplify  their  design  by  ignoring 
some  of  the  correlation  functions  in  MIMO  channels.  For  example,  existing  commer¬ 
cial  MIMO  emulators,  such  as  NoiseCom  MP-2500,  Agilent  N5115B,  Azimuth  ACE- 
MX/440B,  etc.,  only  implement  the  temporal  correlation  function  of  fading  channels. 
Many  so-called  MIMO  emulators  in  the  literature  [5]  and  [6]  only  implement  multiple 
uncorrelated  frequency-flat  fading  sub-channels  or  consider  only  spatial  or  temporal 
correlation. 

1.2  PROBLEM  STATEMENT  AND  DESIGN  APPROACH 

This  dissertation  investigates  several  main  challenges  in  hardware-based  triply- 
selective  MIMO  fading  channel  emulators  with  accurate  correlation  properties.  These 
challenges  include  incorporating  correlation  matrices  into  channel  models,  generating 
CIR  outputs  in  real-time,  and  making  tradeoff  between  processing  speed  and  hardware 
resource  usage. 

The  first  challenge  is  to  implement  some  matrix  computation  modules  in  hard¬ 
ware.  To  incorporate  the  three  types  of  correlation  in  (1.4)  into  MIMO  fading  channels 
in  hardware  emulators,  extensive  matrix  computations  such  as  Kronecker  product, 
matrix  square  root,  and  matrix  multiplication  are  required.  Effectively  implementing 
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them  in  hardware  is  challenging  due  to  2-D  properties  of  matrix  computations.  In 
particular,  computing  the  correlation  coefficients  in  (1.6)  and  matrix  square  roots  are 
the  most  complex  tasks  for  hardware  implementations. 

The  second  challenge  is  to  meet  output  real-time  requirements  for  different 
fading  channels.  The  Kronecker  product,  matrix  square  root,  and  matrix  multipli¬ 
cation  require  a  large  amount  of  multiplications  and  additions,  especially  when  the 
sizes  of  matrices  are  large.  In  practical  RF  and  underwater  acoustic  wireless  chan¬ 
nels,  MIMO  sizes,  M  x  N,  range  from  2  x  2  to  8  x  12.  The  CIR  length,  L,  can  vary 

1 

from  ten  to  several  hundreds.  The  resulting  matrix,  C^(0),  can  have  a  size  as  large 
as  1000  x  1000.  The  implementation  challenges  lay  in  how  to  speed  up  the  com¬ 
putation  via  parallel  processing,  pre-computing,  and  how  to  algorithmically  reduce 
computational  load. 

The  third  challenge  is  to  achieve  good  balance  between  processing  speed  and 

1 

hardware  usage.  For  example,  using  pre-computed  correlation  matrix,  C^(0),  may 
eliminate  the  huge  computational  load  completely  and  achieve  high  processing  speed, 
but  this  approach  requires  huge  memory  resource  to  store  the  pre-computed  data 
and  place  stringent  requirement  on  hardware  memories.  Another  example  is  the 
selection  between  parallel  and  serial  structures.  Using  fully  parallel  processing  for  all 
MIMO  sub-channels  can  increase  processing  speed  but  again  requires  large  hardware 
resources  that  are  linearly  proportional  to  the  size  of  the  channel.  Using  all  serial 
processing  saves  hardware  resources  but  requires  a  large  amount  of  processing  time. 
Analyzing  real-time  requirement  of  different  MIMO  channels  and  designing  balanced 
implementation  are  the  challenges  as  well  as  the  contributions  of  this  dissertation. 

The  approach  to  dealing  with  the  hardware  challenges  utilizes  four  techniques: 

II  I 

1.  Pre-compute  three  small-sized  matrices,  vh-^x,  and  C jSI,  rather  than  the 

1  1 
large-sized  C^(0),  and  then  implement  Kronecker  product  to  compute  C^(0) 
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11  1 

from  ^Txi  and  C|SJ.  This  technique  achieves  good  tradeoff  between 

processing  speed  and  memory  usage. 

2.  Develop  a  mixed  parallel-serial  (mixed  P-S)  computational  structure,  which 
employs  different  numbers  of  computational  paths  to  compute  the  Kronecker 
product  and  vector  multiplication  in  parallel.  The  computational  speed  of  his 
mixed  P-S  structure  is  flexible  and  adjustable  to  meet  various  real-time  require¬ 
ments  of  different  triply-selective  fading  channels. 

3.  Design  a  storage  module,  which  takes  advantage  of  the  symmetrical  property 

1 

of  CfSI  to  save  memory  usage.  This  module  only  stores  a  half  of  coefficients 
1 

of  C]8I,  and  can  rebuild  another  half  according  to  the  stored  half.  When  long 

1 

CIR  channels  with  the  large-sized  CjSI  are  emulated,  this  module  can  save  a 
large  amount  of  memory  usage. 

4.  Efficiently  reuse  the  FRFG  of  (1.5)  to  generate  multiple  independent  signals 
required  in  of  (1.4).  The  proposed  method  employs  one  FRFG  to  generate 
up  to  hundreds  of  frequency-flat  fading  sub-channels  in  parallel.  It  makes  use  of 
Ping-Pong  buffers  to  buffer  the  outputs  of  the  FRFG,  and  synchronize  speed  of 
these  outputs  with  mixed  P-S  computational  paths.  A  large  amount  of  hardware 
resources  are  saved  by  reducing  multiple  FRFGs  to  only  one. 

The  proposed  MIMO  triply-selective  emulators  are  implemented  and  tested 
on  an  FPGA  platform.  The  Altera  Stratix  III  EP3SL150F1152C2N  FPGA/DSP  de¬ 
velopment  kits  is  employed  as  the  hardware  platform.  This  development  kit  contains 
Stratix  III  EP3SL150F1152C2N  FPGA  chip,  72  MB  SRAM,  16  MB  flash  memory, 
display  LEDs,  push-buttons,  DIP  switches,  and  data  conversion  high  speed  mezzanine 
daughter  cards.  The  Stratix  III  FPGA  chip  features  142000  logic  elements  (LEs),  5499 
Kbits  of  memory,  384  multiplier  blocks,  eight  phase  locked  loops  (PLLs),  16  global 
clock  networks,  and  736  user  I/Os.  Altera  Quartus  II  v9.1,  DSP  Builder,  and  Matlab 


Simulink  are  used  for  for  hardware  development.  Several  fading  channel  examples 
are  implemented  on  the  development  kit,  and  their  output  results  are  analyzed  and 
verified  in  Matlab. 

In  addition,  this  dissertation  also  developed  a  hardware  wireless  2-by-2  MIMO 
channel  testbench.  This  testbench  can  be  used  to  study  CIRs  and  correlation  proper¬ 
ties  of  real-world  MIMO  channels.  Experimental  results  have  verified  the  spatial  and 
temporal  correlation  properties  with  in-door  fixed  mobile-to-mobile  MIMO  channels. 


1.3  SUMMARY  OF  CONTRIBUTIONS 

Research  of  this  dissertation  addresses  the  technical  challenges  in  hardware 
implementation  of  triply-selective  MIMO  fading  channel  emulators  and  the  wireless 
MIMO  channel  testbench.  My  work  results  in  one  journal  publications,  one  journal 
submission,  and  four  conference  publications.  The  complete  publication  list  can  be 
found  in  Section  3.  The  technical  contributions  of  the  dissertation  are. 

1 

1.  A  new  hardware  implementation  method  for  on-chip  inter-tap  correlation  CfSI 

generator  is  proposed  and  successfully  incorporated  to  triply-selective  fading 

channel  emulators.  So  far,  none  of  the  other  existing  emulators  implement  all 

1 

three  types  of  correlation.  The  algorithm  of  computing  CfSI  is  based  on  (1.6) 

1 

and  matrix  square  root.  The  CjSI  generator  employs  a  LUT  scheme  and  serial 

1 

computational  structure  to  generate  the  coefficient  of  CfSJ  one  by  one,  In  order 

1 

to  compute  the  matrix  square  root,  the  C  2ISI  generator  implements  the  Jacobi 

algorithm  for  singular- value  decomposition,  the  square  root  calculation  for  each 

diagonal  element,  and  the  matrix  multiplication  among  three  matrices.  The 
1 

CjSI  generator  can  be  run  once  at  the  beginning  of  each  simulation  trial,  and 

its  results  can  be  stored  and  used  for  the  entire  trial. 
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2.  The  mixed  P-S  computational  structure  is  proposed  and  incorporated  fully  serial 
FRFG  to  the  triply-selective  fading  channel  emulators.  Comparing  to  the  serial 
and  parallel  structures,  the  mixed  P-S  structure  makes  the  best  tradeoff  between 
computational  speed  and  hardware  usage  for  extreme  time-consuming  modules 
such  as  the  Kronecker  product  and  vector  multiplication.  It  employs  parallel 
computational  paths  to  generate  multiple  results  of  Kronecker  product  and 
vector  multiplication  in  one  clock  period.  The  number  of  computational  paths 
can  be  adjusted  to  meet  real-time  requirements  of  different  fading  channels. 
Proved  by  our  testing,  The  mixed  P-S  structure  can  handle  MIMO  channel 
with  (. MNL )  up  to  1600.  According  to  the  equation  (1.5),  the  serial  FRFG 
takes  Doppler  frequency  and  symbol  period  as  inputs,  and  generates  multiple 
frequency-flat  fading  sub-channels  in  parallel.  The  linear  feedback  shift  register 
(LFSR)  random  number  generators  (RNG)  and  an  accurate  LUT  scheme  are 
employed  to  generate  results  of  cos  /  sin(-)  in  the  equation  (1.5)  at  the  precision 
level  of  6.1  x  10~5.  Sub-channels  generated  by  this  FRFG  are  proven  to  have 
accurate  statistical  properties  and  temporal  correlation. 

3.  The  hardware  implementation  of  wireless  MIMO  channel  testbench  includes 
transmitter  and  receiver  design.  In  the  transmitter  side,  the  frame  assemble 
module  and  digital  up  convertor  (DUG)  are  implemented  on  the  FPGA  devel¬ 
opment  kit.  In  the  receiver  side,  the  bandpass  sampler,  digital  down  convertor 
(DDC),  frame  synchronization  module,  carrier  phase  detection  and  compensa¬ 
tion  module,  and  frame  extraction  module  are  implemented.  Based  on  real- 
world  measurement  experiments,  this  testbench  can  provide  experimental  re¬ 
sults  for  the  CIR  estimation  and  correlation  matrices  estimation.  Experimental 
results  demonstrate  that  the  discrete-time  triply  selective  fading  channel  can  be 
expressed  as  separable  temporal,  inter-tap  and  spatial  correlations.  The  spatial 
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and  inter-tap  correlation  matrices  can  be  estimated  through  the  decomposition 
of  channel  coefficient  covariance  matrix. 
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PAPER 

I.  HARDWARE  EMULATION  OF  WIDEBAND 
CORRELATED  MULTIPLE-INPUT 
MULTIPLE-OUTPUT  FADING  CHANNELS 

Fei  Ren  and  Yahong  Rosa  Zheng 

Abstract — A  low-complexity  hardware  emulator  is  proposed  for  wideband,  corre¬ 
lated,  multiple-input  multiple-output  (MIMO)  fading  channels.  The  proposed  emu¬ 
lator  generates  multiple  discrete-time  channel  impulse  responses  (CIR)  at  the  symbol 
rate  and  incorporates  three  types  of  correlation  functions  of  the  subchannels  via  Kro- 
necker  product:  the  spatial  correlation  between  transmit  or  receive  elements,  tem¬ 
poral  correlation  due  to  Doppler  shifts,  and  inter-tap  correlation  due  to  multipaths. 
The  Kronecker  product  is  implemented  by  a  novel  mixed  parallel-serial  (mixed  P-S) 
matrix  multiplication  method  to  reduce  memory  storage  and  to  meet  the  real-time 
requirement  in  high  data-rate,  large  MIMO  size,  or  long  CIR  systems.  We  present  two 
practical  MIMO  channel  examples  implemented  on  an  Altera  Stratix  III  EP3SL150F 
FPGA  DSP  development  kit:  a  2-by-2  MIMO  WiMAX  channel  with  a  symbol  rate  of 
1.25  million  symbols/second  and  a  2-by-6  MIMO  underwater  acoustic  channel  with 
100-tap  CIR.  Both  examples  meet  real-time  requirement  using  only  12-14  percent  of 
hardware  resources  of  the  FPGA. 


1  INTRODUCTION 


Fading  channel  emulators  provide  a  fast  and  low-cost  method  for  testing 
and  verifying  new  algorithm  design,  transceiver  performance,  and  channel  capacity 
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analysis  [1,2,3].  Many  products  are  available  commercially  for  emulating  single- 
input  single-output  (SISO)  or  multiple-input  multiple-output  (MIMO)  fading  chan¬ 
nels.  For  example,  the  NoiseCom  MP-2500  multipath  fading  emulator  can  emulate 
SISO  frequency- selective  fading  channels  with  up  to  12  delay  paths.  The  Agilent 
N5115B  baseband  studio  test  set  is  featured  with  standards-based  fading  configura¬ 
tions  and  can  support  fading  channels  with  up  to  48  delay  paths.  The  Rohde&Schwarz 
ABFS  simulator  offers  two  independent  six-path  baseband  fading  channels  with  pre¬ 
programmed  fading  models  in  mobile  radio  standards  [4],  The  Azimuth  ACE-400WB 
supports  up  to  4-by-4  MIMO  fading  channels  in  real  time  with  antenna  correla¬ 
tion  [5].  The  Elektrobit’s  Propsim  F8  RF  channel  emulator  can  support  up  to  16 
MIMO  fading  channels  with  various  radio  interfaces  such  as  802. lln,  3GPP  LTE, 
WiMAX,  and  Wi-Fi  [6].  Although  most  of  them  are  equipped  with  advanced  features 
such  as  fading  channel  profiles  specified  by  current  standards,  bi-directional  channel 
modeling,  RF  interfaces,  etc.,  these  existing  emulators  only  provide  multiple  indepen¬ 
dent  fading  subchannels  with  the  temporal  correlation  function  implemented  through 
Doppler  spectrum  filtering  or  Sum  of  Sinusoids  (SoS).  However,  practical  MIMO  fad¬ 
ing  channels  usually  exhibit  all  three  types  of  correlation  functions,  referred  to  as 
triply-selective  channels:  time-selectivity  due  to  Doppler  (described  by  temporal  cor¬ 
relation),  frequency-selectivity  due  to  multipath  (described  by  inter-tap  correlation), 
and  space-selectivity  (associated  with  the  spatial  correlation  of  transmitters  and  re¬ 
ceivers)  [3].  It  has  been  shown  in  [1]  that  these  correlation  functions  have  significant 
impact  on  channel  capacity,  bit  error  rate  (BER),  and  transceiver  design.  Ignoring 
these  correlation  functions  will  lead  to  impractical  testing  results. 

Incorporating  correlation  functions  into  fading  subchannels  is  the  key  but 
difficult  aspect  of  accurately  generating  correlated  MIMO  fading  channels.  Many 
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software-based  channel  simulators  [1,2,3,7,8,9,10]  have  successfully  simulated  doubly- 
selective  or  triply-selective  correlated  fading  channels,  and  they  provide  the  the¬ 
oretical  foundation  for  hardware-based  channel  emulators.  Recently,  research  on 
hardware-based  channel  emulators  for  doubly-selective  fading  channels  are  reported 
in  [11,12,13,14,15],  where  [11,12,13]  propose  frequency-selective  SISO  fading  channel 
emulators,  and  [14,15]  report  MIMO  fading  channel  emulators  without  spatial  or/and 
inter-tap  correlation.  Recently,  we  developed  a  hardware-based  MIMO  fading  chan¬ 
nel  emulator  [16]  incorporating  all  three  types  of  correlation  functions  based  on  the 
software  simulation  method  in  [3].  This  emulator  [16]  computes  the  three  correlation 
matrices  in  the  hardware  and  can  emulate  a  baseband  MIMO  triply-selective  fading 
channel  with  (M  x  N  x  L)=160,  where  M  is  the  number  of  receive  elements,  N  is  the 
number  of  transmit  elements,  and  L  is  the  number  of  taps.  It  is  more  challenging  for 
such  a  correlated  MIMO  fading  channel  emulator  to  meet  the  real-time  requirement 
in  high  data-rate,  large  MIMO  size,  or  long  channel  impulse  response  (CIR)  fading 
channels. 

In  this  paper,  we  improve  the  MIMO  fading  channel  emulator  in  [16]  through 
a  novel  mixed  parallel-serial  ( mixed  P-S)  multiplication  structure  and  two  sets  of 
Ping-Pong  buffers  to  achieve  real-time  implementations  of  large- dimension  MIMO 
channels.  The  new  emulator  is  capable  of  generating  MIMO  baseband  equivalent 
fading  channels  with  up  to  (M  x  N  x  L)=1600.  This  is  equivalent  to  either  1600 
independent  frequency-flat  fading  channels,  or  16  SISO  frequency-selective  fading 
channels  with  100  taps  each,  or  a  N-by-M  ( MN  <  16)  triply-selective  fading  channel 
with  100  taps  per  subchannel.  To  demonstrate  the  capability  and  accuracy  of  the 
emulator,  two  typical  MIMO  fading  channel  examples:  a  2-by-2  WiMAX  channel  with 
a  short  symbol  duration  time  0.8  /jls  and  a  2-by-6  underwater  acoustic  channel  with 
100  taps  CIRs,  are  implemented  on  a  Stratix  III  EP3SL150F  FPGA  DSP  development 
kit,  and  their  outputs  are  proved  to  have  accurate  correlation  properties.  Less  than 
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15  percent  of  the  hardware  resource  is  required  in  these  two  examples  and  real¬ 
time  requirements  are  met.  The  proposed  MIMO  channel  emulators  are  tested  via 
Hardware-in-Loop  (HIL)  models  in  Simulink. 


2  THE  MATHEMATIC  MODEL 


The  mathematic  model  of  the  proposed  emulator  is  the  discrete-time  MIMO 
triply  selective  fading  model  in  [3].  Consider  a  MIMO  channel  with  N  transmit  and 
M  receive  elements.  The  input-output  relationship  of  the  channel  in  the  discrete-time 
domain  is  described  as 


L2 

y(k)=  h(M)-x(^-0  +  v(% 


l=—L\ 


(1) 


where  x(/c)  =  [x\ (k),x2(k),  ...,XN(k)Y  is  the  transmitted  signal  vector,  y(k)  =  [yi(k), 
7/2 (k),  ...ji/Mik)]*  is  the  received  signal  vector,  the  superscript  (•)*  is  the  transpose 
operator  of  a  matrix  or  vector,  and  v(/c)  =  [iq (k),v2(k),  ...,  is  the  background 

white  Gaussian  noise.  Note  that  we  assume  the  symbol  duration  being  Ts.  The 
variables  L\  and  L2  are  nonnegative  integers  representing  the  range  of  delay  taps, 
and  derive  that  the  total  channel  length  is  L  —  (L\  +  L2  +  1)  taps.  The  MIMO 
channel  matrix  H (k,  l )  at  time  index  k  and  delay  tap  /  is  defined  by 


H  (k,l) 


hi,i{k,l) 


hi,N(k,  l ) 


\ 


y  hM,i(k,l )  •••  hM'N(k,l)  J 


(2) 
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For  the  convenience  of  description,  we  reshape  the  matrix  H(/c,  /)  to  (M NL)  x  1 
coefficient  vector  as 


Kec(k)  =  [hlji(/c), |  ...  |  h Mil(fc),...,hMiJV(fc)]*  (3) 

where  h m>n{k)  is  the  complex  coefficient  vector  of  the  (m,n)~ th  subchannel  at  time 
index  k  given  by  h m,n(k)  =  [hm^n(k,  —  Li),  hm,n(k,  L2)].  Based  on  the  software 
model  in  [3],  the  vector  h vec(k)  can  be  generated  by 

h „,c(*0  =  c*(0)  ■  #(t)  =  (*h  ®  ®  CfSI)  ■  (4) 

where  <8>  denotes  the  Kronecker  product  and  is  the  square  root  of  matrix  X  such 
that  X  =  X3  •  (X.^)h  with  the  superscript  (-)h  being  the  Hermitian  operator.  The 
spatial  correlation  matrices  T'Ra,  and  ^Tx  are  determined  by  properties  of  the  transmit 
and  receive  elements,  respectively,  and  are  usually  pre-known  by  users.  The  inter-tap 
covariance  matrix  CISi  is  computed  according  to  the  power  delay  profile  using  (17) 
in  [3].  The  ( MNL )  x  1  vector  <&(/c)  is  defined  as  <f>(k)=[Zi(k),  Z2(k), ...,  Z(MNL)(k)Y. 
Each  complex  coefficient  Z^k)  =  Zc.(k)+jZSi(k)  (■ i  =  1,  2, ...,  (MNL))  represents  one 
of  multiple  uncorrelated  Rayleigh  fading  waveforms  and  can  be  efficiently  simulated 
by  the  sum  of  sinusoids  (SoS)  method  in  [17,18]. 


3  HARDWARE  IMPLEMENTATION  METHOD 

For  the  convenience  of  describing  hardware  implementations,  we  define  three 
1  11  1 

new  matrices  C  =  CjSI,  D  =  and  E  =  C£(0).  The  coefficients  of  the  chan¬ 

nel  vector  h vec(k)  in  (3)  are  rearranged  as  hvec(k)  =  [H (1,  k),  H( 2,  k), ...,  H(MNL,  k)]f, 
and  H(w,  k)=Hc(w,  k)  +  jHs(w,  k),  where  w  —  1,  2, ...,  (MNL). 
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The  proposed  MIMO  fading  channel  emulator  outputs  h vec{k)  for  iV-by-M 
subchannels  with  L  taps  per  subchannel  within  a  symbol  period.  Its  hardware  im¬ 
plementation  consists  of  five  modules:  a  flat  Rayleigh  fading  generator  (FRFG),  two 
Ping-Pong  buffers,  a  correlation  multiplier  (CM)  module,  and  an  interpolation  mod¬ 
ule,  as  shown  in  Fig.  1.  The  FRFG  module  serially  generates  ( ALNL )  uncorrelated 
flat  Rayleigh  fading  waveforms  Z^Rk)  (for  i  —  1,2, ...,  ( MNL ))  with  proper  symbol 
duration  Ts ,  maximum  Doppler  frequency  fd,  and  decimation  rate  R.  Its  outputs  are 
separated  into  the  real  part  ZCi{Rk )  and  the  imaginary  part  ZSi{Rk). 


Figure  1.  Block  diagram  of  proposed  correlated  MIMO  fading  channels  emulator. 


The  Ping-Pong  buffers  save  the  serial  outputs  of  the  FRFG  and  convert  them 
into  parallel  outputs  that  are  required  by  the  following  CM  module.  Utilizing  the 
Ping-Pong  buffer  ensures  that  only  a  single  FRFG  module  is  needed  to  provide  all 
{MNL)  uncorrelated  Rayleigh  fading  channel  waveforms.  The  Ping  buffer  and  Pong 
buffer  work  alternatively  to  temporarily  store  {MNL)  uncorrelated  fading  channel 
responses.  Two  sets  of  Ping-Pong  buffers  are  employed  to  buffer  the  real  and  imagi¬ 
nary  parts  of  uncorrelated  complex  fading  channel  responses  separately.  The  design 
parameter  R  is  carefully  chosen  to  meet  the  real-time  requirements. 

The  CM  module  incorporates  three  types  of  correlation  functions  into  the  un¬ 
correlated  fading  channel  responses  via  Kronecker  product  and  vector  multiplication 
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in  (4) .  It  is  memory  demanding  if  an  all-parallel  structure  is  used,  it  is  time  consum¬ 
ing  if  an  all-serial  structure  is  employed,  especially  when  variables  N,  M,  and  L  are 
large.  The  proposed  CM  module  employs  a  mixed  P-S  method  to  implement  matrices 

D  and  E  thus  drastically  reducing  memory  requirement  and  processing  time.  We  also 

1 

exploit  the  symmetry  of  the  matrix  CjSI  and  employ  a  symmetric  storage  submodule 
to  save  approximate  half  of  the  memory  space. 

Then  the  interpolator  module  linearly  interpolates  samples  with  an  interpola¬ 
tion  rate  R  (same  to  the  decimation  rate)  to  output  symbol-rate  fading  waveforms. 

The  hardware  implementation  of  FRFG  and  interpolation  modules  are  sim¬ 
ilar  to  those  in  [19]  and  the  Ping-Pong  buffers  and  CM  module  are  new  structures 
developed  in  this  work.  A  brief  review  of  the  FRFG  and  interpolation  modules  and 
detailed  structures  of  the  Ping-Pong  buffers  and  CM  module  are  given  in  the  next 
few  subsections. 


3.1  The  FRFG 

One  FRFG  module  is  utilized  to  generate  (MNL)  independent  flat  Rayleigh 
fading  coefficients  Zi(Rk)  in  series  with  a  downsampling  factor  R.  The  SoS  method 
[18]  is  employed  to  implement  the  flat  Rayleigh  fading  waveforms  via  random  number 
generator,  LUT  for  sine  and  cosine  functions,  and  multipliers  and  adders,  as  shown 
in  Fig.  2.  The  SoS  method  generates  the  real  and  imaginary  parts  of  the  coefficients 
by  sum  of  P  sinusoids 


Zi(k) 

Z«(k) 

Z„(k) 


&p,i 


z*(k)  +  jZa.(k), 


cos(27T (fdkTs  cos  aPti  +  0Pji)), 
cos(2vr (fdkTs  sin  ap4  +  <pp>i)), 


7r(p  -  0.5  +  9i) 
2 P 


P  —  1,  2,  •  •  •  ,P. 


(5) 
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where  fd  is  the  maximum  Doppler  frequency,  P  is  the  total  number  of  sinusoids  and 
j  =  yj—  1.  The  angle  of  arrival  aP}i  is  randomized  by  a  6G  The  random  variables 
c j)Pti  and  (fPti  are  the  random  phases  of  the  in-phase  and  quadrature  components, 
respectively.  The  random  variables  and  6i  are  statistically  independent  and 

uniformly  distributed  on  [—0.5,  0.5)  for  all  p. 


Figure  2.  Implementation  blocks  of  the  FRFG  module. 


3.2  Ping-Pong  Buffers 

The  Ping-Pong  buffers  synchronize  the  FRFG  module  with  the  CM  module 
and  make  it  possible  for  the  single  FRFG  to  continuously  provide  multiple  uncorre¬ 
lated  Rayleigh  fading  channel  responses  for  the  CM  module.  They  perform  a  serial 
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to  parallel  data  conversion  via  properly  buffering  and  outputting  data.  Two  identical 
sets  of  Ping-Pong  buffers  are  needed  to  buffer  the  real  part  ZCi  ( Rk )  and  the  imaginary 
part  Zs.  ( Rk )  separately.  Each  Ping-Pong  buffer  contains  two  banks  of  RAMs  named 
PingRAMs  and  PongRAMs.  The  block  diagram  of  the  Ping-Pong  buffer  storing 
the  real  part  of  coefficients  is  shown  in  Fig.  3. 

The  Ping  Pong  buffer  contains  ( MN )  units  of  RAMs  and  each  RAM  contains 
L  words.  The  inputs  ZCi(Rk )  (where  i  =  1,2, ...,  (MNL))  are  fed  to  the  Ping-Pong 
buffer  in  the  following  format.  In  a  macro  period  of  ( MNLL )  clock  cycles,  the  serial 
sequence:  Zcl(Rk ),  ...,  ZCMNL(Rk),  is  input  sequentially  in  the  first  (MNL)  clock 
cycles,  and  then  all  zeros  are  input  in  the  rest  of  ( MNL(L  —  1))  clock  cycles.  In 
the  next  macro  period,  the  variable  k  is  increased  by  one  and  then  an  updating 
sequence  is  input  in  the  similar  format.  The  demultiplexer  DEMUX  and  up-counter 
Counter  Sel  1  work  together  to  distribute  coefficients  ZCi  ( Rk )  into  different  RAM 
units.  The  up-counter,  Counter  Sel  1,  increases  by  one  in  every  L  clock  cycles  to  select 
one  of  the  (MN)  output  ports  of  the  DEMUX.  Another  up-counter,  Counter  Addr  1, 
generates  write/read  addresses  for  all  RAMs.  The  pulse  with  length  of  L  clock  cycles, 
which  is  generated  by  a  periodic  pulse  generator,  is  used  as  the  control  signal  “wren1' 
for  the  RAMs  enabling  the  write/read  operations.  The  pulse  is  delayed  by  (i  —  1  )L 
clock  cycles  for  the  i-th  Ping  RAM  unit,  and  it  is  delayed  by  (MNLL  +  (i—  l)L)  clock 
cycles  for  the  i-th  Pong  RAM  unit.  In  Fig.  3,  some  connecting  lines  between  delay 
blocks  and  their  corresponding  “wren1'  ports  are  not  drawn  so  as  to  avoid  increasing 
complexity  of  the  figure. 

Totally,  (MN)  multiplexers  named  MUX  are  used  to  select  Ping  RAMs  or 
Pong  RAMs  to  be  connected  to  the  (MN)  parallel  output  ports  named  Zc_outA  ~ 
Zc_outJ\dN.  These  multiplexers  are  controlled  by  the  selection  signal  generated  by 
the  up-counter,  Counter  Sel  2.  Each  output  port  sequentially  outputs  real  parts  of 
L  uncorrelated  fading  channels  in  the  following  format.  In  a  period  of  (ALNLL) 
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Figure  3.  Hardware  implementation  of  the  Ping-Pong  buffer  module.  This  diagram 
shows  the  data  buffer  for  the  real  part  Zc.(Rk).  The  imaginary  part  uses  a  similar 
buffer  structure. 


clock  cycles,  the  output  port  Zc_outJ  serially  outputs  the  sequence:  ZCL{._1)+1(Rk), 
ZCL(i_1)+2(Rk),  ...,  ZCu(R,k),  for  (. MNL )  times.  In  the  next  period,  the  variable  k 
is  increased  by  one  and  then  an  updating  sequence  is  output  in  the  similar  format. 
These  outputs  are  fed  to  the  CM  module  and  to  be  multiplied  with  the  coefficients 
of  matrix  E. 

3.3  Correlation  Multiplier  Module 

The  proposed  CM  module  is  implemented  by  the  mixed  P-S  method,  as  shown 
in  Fig.  4.  It  employs  3(MN)  multipliers,  five  adders  and  two  accumulators,  all  capable 
of  outputting  results  within  one  clock  cycle.  Two  memory  banks  RAM C  and  RAM Di 
stores  the  pre-computed  coefficients  of  matrices  C  and  D,  respectively.  If  the  size 
of  matrix  C  is  large  which  is  often  the  case  in  wideband  systems,  then  only  its 


21 


diagonal  and  upper-triangular  elements  are  stored  to  save  memory  space,  thanks  to 
its  symmetric  property  [3].  The  j-th  row  of  matrix  C  is  stored  in  RAAdC  with 
(L  —  j  +  1)  coefficients.  The  addresses  of  RAMC  are  sequentially  allocated  ranging 
from  1  to  <'LlpL .  Two  up-counters,  Counter  2  and  Counter  3,  are  used  to  generate 
the  proper  row  and  column  indices  of  matrix  C,  and  an  address  convertor  converts 
these  indices  into  corresponding  read  addresses  of  RAMC.  Actually,  the  address 
convertor  computes  the  read  addresses  by: 


Read  Address  =  (min {lr,lc}  —  1)  ^max{Zr,  lc}  — ™n{^r,  lc]  ^  _j_  maxj lrJc} 


(6) 


where  lr  is  the  row  index;  lc  is  the  column  index;  min{}  and  max{}  find  the  minimum 
and  maximum  values  of  their  arguments,  respectively.  The  address  convertor  and 
RAM C  build  up  a  storage  submodule  that  implements  a  symmetric  storage  method. 

The  size  of  matrix  D  is  often  small  and  coefficients  of  each  column  are  stored 
in  RAMDi  through  RAMDmn  separately.  The  up-counter  Counter  1  and  an  adder 
generate  the  read  address  for  RAM D,  to  output  ( MN )  coefficients  simultaneously.  If 
the  size  ( MN )  is  large,  then  a  similar  memory  scheme  as  RAMC  maybe  adopted  for 
RAMDi.  In  every  clock  cycle,  the  output  of  RAMC  is  multiplied  with  the  outputs 
of  RAM  D\  ~  RAM  Dmn  to  obtain  ( MN )  coefficients  of  matrix  E  in  parallel. 

The  vector  multiplication  in  (4)  is  implemented  by  multiplying  the  ( MN )  co¬ 
efficients  of  matrix  E  with  the  real  and  imaginary  parts  of  the  ( MN )  uncorrelated 
Rayleigh  channel  responses  stored  in  the  Ping-Pong  buffers.  Results  are  added  to¬ 
gether  for  the  real  and  imaginary  parts  respectively,  and  then  two  sums  are  sent  to 
the  two  accumulators.  In  a  period  of  L  clock  cycles,  the  accumulator  sums  its  inputs 
in  the  previous  L  clock  cycles  to  obtain  a  single  output  Hc(w,Rk )  or  Hs(w,Rk). 
The  outputs  of  the  accumulators  are  down-sampled  with  a  down-sampling  rate  L 
before  outputting  to  the  interpolation  module.  Finally,  the  interpolation  module 
takes  Hc/S(w,  R{k  —  1))  and  Hc/S(w,  Rk)  to  produce  all  coefficients  of  h vec(k)  in  real 
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time.  It’s  worth  nothing  that  the  Kronecker  product  can  be  computed  alternatively 
11  1 

by  D  =  4/^  ®  Cj2SI  first  and  then  E  =  \I/^X  ®  D.  The  proposed  mixed  P-S  method 

can  implement  this  case  by  simply  switching  the  contents  of  RAMD  and  RAALC. 

However,  the  best  implementation  is  to  use  RAMC  to  store  the  one  with  the  largest 
11  1 

dimension  of  vk /{x .  and  CjSI,  and  use  RAMD  for  the  Kronecker  product  of  the 
other  two  matrices. 

In  contrast  to  the  mixed  P-S  method,  the  emulator  in  [16]  employed  a  serial 

1 

method  and  three  small  RAMs  A,  B,  C  to  store  the  coefficients  of  the  matrices  4/^, 
1  1 

^Tx)  and  CjSI.  The  emulator  can  meet  the  real-time  requirement  only  for  a  small 
value  of  (. MNL ).  The  serial  method  cannot  compute  fast  enough  to  meet  real-time 
requirement  when  the  channel  has  long  CIRs  and/or  the  symbol  duration  reduces. 
The  mixed  P-S  method  can  solve  this  problem.  It  employs  (. MN )  parallel  compu¬ 
tational  paths  and  can  compute  the  Kronecker  product  (MN)  times  faster  than  the 
serial  method  does.  It  also  requires  significantly  less  memory  space  and  multiplier  uti¬ 
lization  than  a  pure  parallel  method  that  can  output  really  fast.  Therefore,  the  mixed 
P-S  method  achieves  the  best  tradeoff  between  computational  speed  and  hardware 
resource  utilization. 

3.4  Interpolator  Module 

The  interpolator  module  performs  a  linear  interpolation  with  a  rate  R  to  gen¬ 
erate  fading  coefficients  at  the  symbol  rate.  The  structure  of  the  interpolator  module 
is  shown  in  Fig.  5,  where  the  inputs  of  the  real  and  imaginary  parts  from  the  correla¬ 
tion  module,  Hc(w,  Rk )  and  Hs(w,  Rk ),  are  processed  separately  in  parallel  through  a 
common  control  logic.  In  every  (MNL)  BCPs  (basic  clock  period),  the  enable  control 
block  controls  the  counter  to  increase  from  0  to  (R  —  1)  in  the  first  R  BCPs  and  to 
hold  at  (R  —  1)  in  the  remaining  (MNL  —  R)  BCPs.  The  counter  output  is  normal¬ 
ized  with  1  / R.  The  real  part  input,  Hc(w,Rk),  is  delayed  by  (MNL)2  BCPs  and 
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then  subtracted  from  the  original  input.  The  result  is  multiplied  with  the  normalized 
counter  output  and  then  added  to  the  delayed  input  Hc(w,R(k  —  1))  to  obtain  the 
interpolated  Hc(w,k).  The  imaginary  part  Hs(w,k)  is  implemented  similarly. 


Figure  4.  Hardware  implementation  of  CM  module  using  the  mixed  P-S  method.  In 
this  design,  ( MN )  coefficients  of  matrix  E  are  output  in  parallel  per  clock  cycle,  and 
one  row  of  E  is  output  in  every  L  clock  cycles. 


Figure  5.  Implementation  of  the  interpolator  module. 
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4  IMPLEMENTATION  EXAMPLES 


The  proposed  MIMO  fading  channel  emulator  was  implemented  on  an  Altera 
Stratix  III  EP3SL150F1152C2N  FPGA/DSP  development  kit.  The  clock  frequency 
in  this  implementation  was  F^— 50  MHz,  which  derived  a  clock  cycle  20  ns.  We 
used  Quartus  II  version  9.0,  DSP  Builder  version  9.0,  and  Matlab  Simulink  for  this 
development,  and  hardware-in-the-loop  (HIL)  method  for  testing.  The  emulator  ex¬ 
amples  can  be  found  at  author’s  website  at  http://web.mst.edu/~zhengyr/  as  free 
download. 

Two  MIMO  fading  channel  examples  were  implemented  on  the  emulator  to 
evaluate  accuracy  and  capability  of  this  emulator.  The  first  example  demonstrated 
feasibility  of  the  emulator  in  underwater  communications  by  emulating  a  2-by-6  un¬ 
derwater  acoustic  channel  with  long  CIRs  L=100  and  a  long  symbol  duration  Ts= 250 
fi s.  The  second  example  emulated  a  WiMAX  2-by-2  fading  channel  with  a  short  sym¬ 
bol  duration  Ts= 0.8  /xs  and  short  CIRs  L— 5,  and  proved  the  emulator  to  be  suitable 
for  high  data  rate  communication  channels.  In  order  to  evaluate  accuracy  of  this  em¬ 
ulator,  the  auto/cross-correlation  functions  of  its  output  waveforms  were  computed 
and  compared  to  theoretical  ones. 

4.1  Implementation  Example  I  -  Underwater  Acoustic  Channel 

The  2-by-6  underwater  acoustic  channel  was  implemented  using  the  following 
configuration.  This  underwater  communication  system  consisted  of  two  transmit 
elements  and  six  hydrophones  placed  as  shown  in  Fig.  6.  The  angle  of  arrival  and 
the  angular  spread  were  90°  and  10°  respectively.  The  100-tap  power  delay  profile 
linearly  ramped  up  from  0.2  to  1.8  in  the  first  40  taps,  and  then  fell  down  from  1.8  to 
0.27  in  the  40-100  taps.  Its  total  power  was  normalized  to  one.  The  Tx  and  Rx  filters 


25 


were  square-root  rased-cosine  filters  with  a  roll-off  factor  0.3.  The  cross-correlating 
1 

matrix  CfSI  was  computed  according  to  (17)  in  [3]. 


Figure  6.  The  placement  of  transmit  elements  and  hydrophones  of  the  underwater 
communication  system.  This  is  a  2-by-6  MIMO  underwater  acoustic  communication 
system  where  the  speed  of  the  acoustic  carrier  is  1500  m/s  and  the  frequency  of  the 
carrier  is  15  kHz.  The  carrier  wavelength  is  A=10  cm. 


Other  implementation  parameters  were  selected  as  M =6,  N— 2,  L=100,  Ts= 250 
p-s,  fd= 40  Hz,  and  _R=10.  The  square  roots  of  the  correlation  coefficient  matrices  ^Tx 
and  t&ftr  were  pre-co mputed  as: 
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Based  on  the  outputs  of  the  emulator,  auto/cross-correlation  functions  of  sev¬ 
eral  subchannels,  including  the  auto-correlation  of  /ii,i(75,  k),  the  cross-correlation 
between  h11(75,  k)  and  /^i,i (76, 7c) ,  and  the  cross-correlation  between  hiii(75,  k)  and 
h2,i(75,  k),  were  computed  offline  and  plotted  in  Fig.  7.  According  to  (19)  in  [3], 
their  theoretical  correlation  functions  were  0.7155,  0.1177,  and  —0.1774  multiplying 
by  J0[2irfd(ki  —  k2)Ts],  respectively.  As  can  be  seen,  the  results  of  hardware  outputs 
closely  matched  the  theoretical  ones. 


Normalized  Time  Laq:  kf  ,T 

a  d  s 


Figure  7.  Performance  of  underwater  acoustic  fading  channel  emulator.  Auto¬ 
correlation  of  hiy(75,  k),  cross-correlation  between  h11(75,/c)  and  ^1,1  (76, 7c) ,  and 
cross-correlation  between  hiti(75,k)  and  h2,i (75,  k).  The  channel  index  is  according 
to  (3).  The  results  are  based  on  hardware  outputs  of  200  trials  with  2  x  103  samples 
pre  subchannel  per  trial. 


4.2  Implementation  Example  II  -  WiMAX  Channel 

The  proposed  emulator  also  implemented  the  WiMAX  2-by-2  fading  chan¬ 
nel  example.  The  implementation  parameters  were  selected  as  M=N= 2,  Ts=0.8/is, 
fdTs= 0.001  and  L= 5.  The  angle  of  arrival,  the  angular  spread,  and  the  Tx  and  Rx 
filters  were  the  same  as  those  used  in  the  underwater  example.  The  distances  between 
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two  transmit  elements  and  two  receive  elements  were  12A  and  0.5A,  respectively.  The 

power  delay  profile  contained  three  taps  and  was  given  by  the  SUI-3  model  [20],  which 

was  suitable  for  mostly  flat  terrain  with  moderate  tree  densities.  The  coefficients  of 
11  1 

^Xxi  an(l  CfSI  were  pre-co mputed  and  listed  as: 
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The  short  symbol  duration  caused  a  higher  real-time  requirement  that  ex¬ 
pected  2.5  x  10'  complex  responses  to  be  generated  per  second.  The  short  CIRs 
reduced  computational  time  of  Kronecker  product  and  thus  lower  the  real-time  re¬ 
quirement,  to  some  extent.  Taking  the  short  symbol  duration  and  CIRs  into  consid¬ 
eration,  we  set  R= 3  to  met  the  real-time  requirement. 

Auto/cross-correlation  functions  of  several  subchannels,  including  the  auto¬ 
correlation  of  /iqi (0,  k),  the  cross-correlation  between  A  1,1  (0,  k )  and  hi,i(l,  k),  and  the 
cross-correlation  between  /i1>1(0,  A:)  and  /i2ji(l,  A;),  were  computed  offline  and  plotted 
in  Fig.  8.  Their  theoretical  correlation  functions  were  0.7728,  0.0634  and  -0.0193  mul¬ 
tiplying  by  Jo[2nfd(ki  —  k2)Ts],  respectively.  As  can  be  seen,  auto/cross-correlation 
functions  of  hardware  outputs  matched  the  theoretical  ones  very  well. 
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O  Theoretical  Result,  auto-corrof  h  ^O.k) 

—  Hardware  Output  Result,  auto-corr  of  h  (0,k) 

x  Theoretical  Result,  xcorr  of  h  (0,k)  and  h  (1,k) 
Hardware  Output  Result,  xcorr  of  h  (0,k)  and  h  (1,k) 
+  Theoretical  Result,  xcorr  of  h  (0,k)  and  h  (1,k) 

—  Hardware  Output  Result,  xcorr  of  h  (0,k)  and  h  (1,k) 


Normalized  Time  Laq:  kf.T 

a  d  s 


Figure  8.  Performance  of  the  WiMAX  fading  channel  emulator.  Auto-correlation 
of  /ii;i(0,  k),  cross-correlation  between  /ii,i(0, /c)  and  hiti(l,k),  and  cross-correlation 
between  /ii,i (0,  k)  and  h,2,i(l,k).  The  channel  index  is  according  to  (3).  The  results 
are  based  on  hardware  outputs  of  50  trials  with  2.8  x  104  samples  per  subchannel  per 
trial. 


5  PERFORMANCE  EVALUATION 


In  addition  to  accuracy,  we  evaluated  other  performances  of  the  proposed  em¬ 
ulator  including  speed  and  hardware  usage.  The  speed  of  this  emulator  was  compared 
to  the  emulator  in  [16]  which  employed  a  serial  method.  Moveover,  multipliers  and 
memory  utilization  of  the  mixed  P-S  and  serial  methods  were  analyzed  and  com¬ 
pared.  Finally,  parameter  specifications  and  detailed  hardware  usage  of  the  proposed 
emulator  were  presented. 

5.1  Performance  Comparison  of  Serial  and  Mixed  P-S  Methods 

1 

The  proposed  emulator  with  the  mixed  P-S  method  can  compute  C£  (0)  and 
generate  correlated  fading  complex  responses  much  faster  than  its  counterpart  in 
[16]  with  the  serial  method.  The  cost  was  higher  hardware  utilization,  especially 
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multipliers,  which  were  used  to  construct  multiple  computational  paths.  The  speed 
comparison  for  typical  values  of  M,  N,  and  L  was  shown  in  Fig.  9(a)  which  clearly 
demonstrated  that  the  mixed  P-S  method  saves  a  large  amount  of  time.  The  y-axis 
indicated  the  number  of  clock  cycles  that  were  required  to  generate  one  correlated 
fading  complex  response.  As  can  be  seen,  when  the  two  methods  were  set  to  the  same 
values  of  M,  N,  and  L,  respectively,  the  mixed  P-S  method  was  ( MN )  times  faster 
than  the  serial  method.  Note  that  the  serial  method  required  more  clock  cycles  when 
either  (M  x  N )  or  L  increased.  But  the  mixed  P-S  method  demanded  more  clock 
cycles  only  when  L  increased. 

The  mixed  P-S  method  used  more  multipliers  to  construct  parallel  computa¬ 
tional  paths  in  the  CM  module;  while  the  serial  method  used  a  small  constant  number 
of  multipliers.  The  multiplier  utilization  of  the  two  methods  in  the  CM  module  was 
shown  in  Fig.  9(b).  The  serial  method  employed  seven  multipliers  to  implement  one 
serial  computational  path  irrespective  of  the  values  of  M,  N,  and  L.  The  mixed  P-S 
method  employed  a  variable  number  of  multipliers,  which  was  equal  to  (3 MN). 

When  the  fading  channel  had  long  CIRs,  the  memory  usage  for  storing  a  large 

1 

size  matrix  CjSI  could  be  drastically  reduced  by  making  use  of  the  symmetric  property 
1 

of  matrix  CjSI.  The  full  storage  method  needed  L2  words,  and  the  symmetric  storage 
method  only  needed  words  that  approximately  saved  half  number  of  words. 

5.2  Parameter  Specifications  and  Hardware  Usage 

The  proposed  MIMO  fading  channel  emulator  is  flexible  in  parameter  selec¬ 
tion  and  can  be  customized  to  simulate  channel  scenarios  other  than  the  examples 
presented  here.  Table  1  shows  the  parameter  ranges  of  the  emulator  with  the  FPGA 
chip  clock  Fcik= 50  MHz. 

According  to  Table  1,  the  proposed  emulator  can  emulate  any  MIMO  antenna 
array  combination  of  Rx  and  Tx  up  to  (MN)— 16,  including  2x2,  2x8,  3x3,  4x4  and 
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(a)  Numbers  of  clock  cycles  required. 


(b)  Numbers  of  multipliers  needed. 


Figure  9.  Performance  comparison  for  generating  one  correlated  fading  complex  re¬ 
sponse  using  the  serial  method  and  the  proposed  mixed  P-S  method. 


Table  1.  Parameter  ranges  of  the  proposed  emulator  with  Fcn-=50  MHz. 


Number  of 
Rx,  and  Tx 

Number 
of  taps 

Normalized 
Doppler  Tsfd 

Output  Speed 
(Samples/sec) 

(MN)  <  16 

L  <  100 

1.9x  10_6~1 

50x106x£ 

so  on.  The  maximum  number  of  channel  taps  L=100  covers  most  of  practical  long 
CIR  fading  channels  including  underwater  acoustic  channels.  The  proposed  emulator 
stores  the  normalized  Doppler  frequency  Tsfd  in  the  Q1.19  format  to  ensure  high 
accuracy  -[9  =  1-9  x  10“6.  The  emulator  can  generate  Fclj=R  complex  samples  per 
second.  Each  complex  sample  consists  of  the  real  and  imaginary  parts  represented 
by  the  Q4.14  format.  For  the  underwater  acoustic  channel  with  Ts= 250  /rs,  the 
real-time  requirement  can  be  met  by  setting  i?=10.  For  the  WiMAX  channel  with 
Ts= 0.8  fi s,  the  real-time  requirement  can  be  met  by  setting  R= 3.  For  channels  with 
smaller  symbol  durations,  the  real-time  requirement  can  be  met  by  increasing  the 
clock  frequency  and  R. 
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The  hardware  usage  of  previous  two  implementation  examples  is  summarized 
in  Table  2,  where  ALUT,  DLR,  BM,  DSP,  and  LU  denote  adaptive  look-up  table, 
dedicated  logic  register,  block  memory,  DSP  block  (high-speed  18-bit  multiplier),  and 
overall  logical  utilization,  respectively.  Compared  to  the  WiMAX  one,  the  underwa- 

Tablc  2.  Resource  usage  of  two  examples  on  Stratix  III  EP3SL150F1152C2N  FPGA 
with  Fcik= 50  MHz. _ 


ALUT 

DLR 

BM 

bits 

DSP 

LU 

Underwate 

14845 

r(13%) 

4042 

(4%) 

1241613 

(22%) 

78 

(20%; 

14% 

WiMAX 

11926 

(10%) 

3301 

(3%) 

659407 

(12%) 

35 

(9%) 

12% 

ter  example  employs  more  hardware  resources.  Especially,  it  employs  approximately 
double-size  BMs  and  DSPs,  since  the  implementations  of  Ping-Pong  buffers,  large  size 
RAM  C,  and  parallel  computational  paths.  Note  that  the  total  logical  utilizations  of 
two  examples  are  only  12%  and  14%  of  the  whole  FPGA  chip,  respectively.  The  low 
hardware  utilization  makes  it  possible  to  implement  other  functional  modules  on  the 
same  FPGA  chip. 

The  capability  and  hardware  usage  of  the  proposed  emulator  are  compared 
with  those  of  the  existing  emulators  in  Table  3.  The  numbers  of  LE,  memory  block, 
and  DSP  elements  are  based  on  the  WiMAX  channel  emulator  with  MNL  =  160  for 
the  proposed  emulator  and  the  one  in  [19].  The  (MNL)  for  other  emulators  are  listed 
in  the  table.  It  is  clear  that  the  capability  of  the  proposed  emulator  is  much  higher 
than  the  existing  ones;  while  the  hardware  usage  of  the  proposed  emulator  remains 
very  low. 

5.3  Interfacing  with  Digital  Up-Convertor  and  Down-Convertor 

Although  the  proposed  MIMO  fading  channel  emulator  was  tested  by  the  HIL 
modules  via  Simulink,  it  can  be  easily  integrated  with  the  digital  up-convertor  and 
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down-convertors  to  generate  intermediate  frequency  (IF)  channel  waveforms.  The 
IF  channel  waveforms  can  be  further  converted  via  analog  mixers  to  generate  RF 
channel  waveforms.  Altera  provides  several  readily  designed  digital  IF  convertors  for 
Stratix  III  DSP  development  kit  as  DSP  Builder  Simulink  models  [21].  The  Stratix  III 
DSP  development  kit  has  two  HSMC  interfaces  that  can  interface  with  two  daughter 
boards,  each  having  two  ADCs  and  two  DACs,  thus  a  4-by-4  MIMO  channel  with  IF 
waveforms  can  be  easily  integrated. 


Table  3.  Resource  usage  comparisons  of  related  fading  channel  emulators. 


Proposed 

Emulator 

Emulator 
in  [11] 

Emulator 
in  [19] 

Emulator 
in  [14] 

Emulator 
in  [15] 

Logic  Unit 

15227 

(LE) 

3557  (LC) 

44240 

(LE) 

22272 

(LE) 

46357 

(LC) 

Block  Memory 

659407 

Unknown 

1920089 

Unknown 

440960 

DSP  Element 

35 

Unknown 

194 

Unknown 

136 

RxxTx 

e mam 

lxl 

4x4 

4x4 

Number  of  Taps 

LP 

3 

L  2 

9 

Unknown 

1 

On-chip  CfSI  Calculator 

No 

No 

Yes 

No 

No 

Temporal  Correlation 

SoS 

Spectrum 

filtering 

SoS 

Spectrum 

filtering 

SoS 

Inter-tap  Correlation 

Yes 

Yes  3 

“Yes^ 

No 

Unclear 

Spatial  Correlation 

Yes 

No 

Yes 

No 

Unclear 

Note:  1.  The  numbers  of  Rx,  Tx,  and  taps  meet  the  relationship:  (MNL)<  1600. 

2.  The  numbers  of  Rx,  Tx,  and  taps  meet  the  relationship:  ( M N L)<160 . 

3.  The  inter-tap  correlation  is  implemented  by  upsampling  to  pass  band. 

1 

4.  The  inter-tap  correlation  matrix  CfSI  is  calculated  on  chip. 


6  CONCLUSIONS 


A  wideband  MIMO  fading  channel  emulator  with  accurate  correlation  proper¬ 
ties  has  been  proposed.  The  emulator  employs  a  novel  mixed  P-S  method  to  increase 
the  speed  of  incorporating  correlation  functions.  This  improvement  makes  the  emu¬ 


lator  capable  of  emulating  MIMO  fading  channels  with  a  high  data  rate,  large  MIMO 
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size,  and  long  CIRs  in  real-time.  Two  MIMO  fading  channel  examples  of  underwater 
acoustic  and  WiMAX  have  been  implemented  on  one  Altera  Startix  III  FPGA/DSP 
development  kit  and  evaluated  in  aspects  of  accuracy,  speed,  and  hardware  usage. 
Results  exhibit  that  the  proposed  emulator  employs  low  hardware  resources  and  can 
generate  accurate  MIMO  fading  channel  responses  in  real  time. 
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PAPER 

II.  A  NOVEL  EMULATOR  FOR  DISCRETE-TIME  MIMO 
TRIPLY-SELECTIVE  FADING  CHANNELS 

Fei  Ren  and  Yahong  Rosa  Zheng 

Abstract — Hardware  implementation  of  discrete-time  triply  selective  Rayleigh  fad¬ 
ing  channel  emulators  is  proposed  for  multiple-input  multiple-output  (MIMO)  com¬ 
munications.  The  proposed  work  differs  from  existing  ones  in  that  it  incorporates 
temporal  correlation,  inter-tap  correlation,  and  spatial  correlation  matrices  into  mul¬ 
tiple  uncorrelated  frequency-flat  Rayleigh  fading  waveforms  to  obtain  a  triply  selective 
fading  channel.  The  flat  fading  waveforms  with  temporal  correlation  or  Doppler  spec¬ 
trum  are  generated  using  a  Sum-of-Sinusoids  (SoS)  method.  The  inter-tap  correlation 
matrix  associated  with  multipath  delay  spread  is  computed  according  to  the  chan¬ 
nel  power  delay  profile  and  transmit /receive  filters.  The  spatial  correlation  matrices, 
including  the  transmit  correlation  and  receive  correlation  matrices,  are  predefined  in¬ 
puts  associated  with  antenna  arrangements.  The  square  roots  of  the  three  correlation 
matrices  are  computed  via  Singular  Value  Decomposition  (SVD)  and  then  combined 
in  real  time  via  Kronecker  product  with  the  flat  fading  waveforms.  Several  fading 
channel  examples  are  implemented  on  an  Altera  Stratix  III  EP3SL150F  FPGA  DSP 
development  kit  with  fixed-point  arithmetics.  A  4  x  4  MIMO  triply-selective  channel 
with  10  correlated  delay-taps  per  sub-channel  utilizes  one  third  of  the  hardware  re¬ 
source  of  the  FPGA  chip.  The  statistical  and  correlation  properties  of  the  emulated 
fading  waveforms  match  those  of  the  software-based  simulators  and  the  theoretical 
ones.  The  proposed  method  achieves  good  balance  between  computational  complexity 


and  resource  utilization. 
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1  INTRODUCTION 


Wireless  fading  channel  modeling,  simulation,  and  emulation  are  important 
topics  in  communications  because  they  provide  a  fast  and  low-cost  method  for  test¬ 
ing  and  verifying  new  algorithm  design,  transceiver  performance,  and  channel  ca¬ 
pacity  analysis  [1,2,  3,4],  To  generate  correct  and  realistic  fading  waveforms,  it  is 
important  that  channel  models  reproduce  accurate  properties  of  actual  propagation 
environments.  One  significant  statistical  property  for  wireless  fading  channel  models 
is  the  correlation  of  fading  channel  waveforms.  For  a  multiple-input  multiple-output 
(MIMO)  system,  three  types  of  correlation  functions,  namely  temporal  correlation, 
inter-tap  correlation,  and  spatial  correlation,  need  to  be  taken  into  consideration  since 
a  practical  MIMO  fading  channel  is  usually  time-selective  (described  by  the  temporal 
correlation),  frequency-selective  (exhibiting  inter-tap  correlation),  and  space-selective 
(associated  with  transmitter  and/or  receiver  spatial  correlation).  This  is  referred  to 
as  the  MIMO  triply  selective  fading  channel  [5].  Incorporating  the  three  types  of 
correlation  functions  into  channel  simulators  or  emulators  is  the  key  and  yet  difficult 
aspect  of  accurately  simulating  MIMO  fading  channels.  The  frequency-flat  Rayleigh 
fading  channel  and  the  frequency-selective  fading  channels  are  two  special  cases  of 
the  MIMO  triply-selective  fading  model  [5],  when  only  the  temporal  correlation,  or 
both  the  temporal  and  inter-tap  correlation  functions  are  involved,  respectively.  It 
is  worth  noting  that  the  most  commonly  used  Wide-Sense  Stationary  Uncorrelated 
Scattering  (WSSUS)  channel  model  [1]  assumes  uncorrelated  scatterers  in  the  pass 
band,  which  leads  to  inter-tap  correlated  fading  waveforms  in  the  baseband  equivalent 
channel  clue  to  the  bandlimited  nature  of  wireless  systems  [6]. 

Software-based  channel  simulators  usually  employ  general-purpose  processors 
and  floating-point  arithmetic  to  generate  fading  channel  impulse  responses.  The  flat 
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Rayleigh  fading  channels  can  be  simulated  by  one  of  the  two  methods:  the  spectrum 
filtering  method  [7,8,9]  and  the  Sum-of-Sinusoids  (SoS)  method  [10,11,12,13].  The 
spectrum  filtering  method  shapes  the  spectrum  of  a  white  Gaussian  waveform  using 
a  filter  that  has  a  transfer  function  equal  to  the  square  root  of  the  power  spectrum 
density  (PSD)  of  the  desired  fading  process.  Doppler  spectrum  filtering  may  be  im¬ 
plemented  by  FIR  filters  [7]  or  HR  filters  [8,9].  It  can  simulate  fading  channels  having 
various  PSD  shapes  and  reach  accurate  statistical  properties  in  every  trial.  The  SoS 
method  sums  a  finite  number  of  sinusoidal  waveforms  having  amplitudes,  frequen¬ 
cies,  and  phases  that  are  appropriately  selected  to  reproduce  the  desired  statistical 
properties  and  Doppler  spectra.  It  is  computationally  efficient  and  flexible  in  param¬ 
eter  reconfiguration  (such  as  the  maximum  Doppler  frequency).  Frequency-selective 
fading  channels  incorporating  inter-tap  correlation  is  more  difficult  to  simulate  be¬ 
cause  the  channel  is  modeled  as  a  time- varying  system  with  a  2-D  scattering  function. 
The  Doppler  frequency  is  often  much  smaller  than  the  symbol  rate;  while  the  WS- 
SUS  delays  are  often  at  fractional  spacing  of  the  symbol  interval.  Many  approaches 
to  discrete-time  frequency-selective  channel  simulation  are  found  in  the  literature, 
including  the  delay- weight-and-sum  method  [6,14],  the  correlation  matrix  multiplica¬ 
tion  method  [15,5],  and  the  2-D  filtering  method  [16].  The  first  one  uses  fractionally 
delayed  transmit /receive  filter  taps  to  weigh  and  delay  multiple  uncorrelated  flat  fad¬ 
ing  waveforms  and  sums  them  together  according  to  the  power  delay  profile  (PDP). 
The  correlation  matrix  multiplication  method  first  computes  the  inter-tap  correlation 
matrix  according  to  the  PDP  and  then  multiplies  the  square  root  of  the  correlation 
matrix  with  multiple  uncorrelated  flat  fading  waveforms.  The  2-D  filtering  method 
filters  multiple  white  Gaussian  processes  by  an  approximating  filter  of  the  delay- 
Doppler  spread  function  using  Gaussian  quadrature  rules.  A  MIMO  triply  selective 
fading  channel  model  is  even  more  challenging,  which  needs  to  consider  the  transmit 
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and  receive  spatial  correlation  functions  in  addition  to  the  temporal  and  inter-tap  cor¬ 
relation  functions.  A  discrete-time  triply-selective  channel,  proposed  by  [5],  computes 
the  symbol-spaced  inter-tap  correlation  matrix  according  to  the  power  delay  profile, 
and  incorporates  the  inter-tap  and  spatial  correlation  matrices  into  multiple  uncor¬ 
related  SoS  flat  fading  CIRs  via  Kronecker  product.  Another  MIMO  channel  simu¬ 
lator,  proposed  in  [17,8],  synthesizes  correlated  vector  channels  (with  user-specified 
correlation  function)  using  the  Auto-Regression  (AR)  modeling  method  to  shape  the 
spectrum  of  uncorrelated  white  Gaussian  processes. 

Research  on  software-based  channel  simulators  provides  the  basis  for  the  de¬ 
sign  of  hardware-based  channel  emulators.  Several  hardware  emulators  for  frequency- 
flat  and  doubly-selective  fading  channels  have  been  reported  in  the  literature.  For 
frequency-flat  Rayleigh  fading  channels,  the  emulator  in  [18]  is  based  on  the  SoS 
method  with  a  modified  random  phase  variable  and  its  hardware  implementation 
uses  reduced  rate  sinusoids  to  achieve  high  speed  and  low  hardware  usage.  Several 
other  hardware  emulators  [19, 20]  are  based  on  the  spectrum  filtering  method  where 
FIR  or  HR  filtering  with  single  or  multiple  interpolation  stages  is  used  to  improve 
the  accuracy  of  the  narrowband  U-shaped  Doppler  spectrum.  For  doubly-selective 
fading  channels,  the  emulator  in  [21]  uses  the  spectrum  filtering  method  to  gener¬ 
ate  multiple  uncorrelated  flat  fading  waveforms,  converts  them  to  pass  band  signals 
(with  upsampling,  and  then  combines  them  as  uncorrelated  scatterers  according  to 
the  power  delay  profile.  Another  emulator  [22]  generates  one  baseband  complex  Gaus¬ 
sian  process  with  Doppler  filtering  and  filters  it  again  in  the  delay-spread  domain  to 
generate  multipaths.  Recently,  we  propose  a  hardware  emulator  [23]  based  on  the 
software  simulation  model  in  [14]  which  uses  the  SoS  method  to  generate  multiple 
uncorrelated  flat  fading  waveforms,  upsamples  them  to  fractionally  spaced  baseband 
signals,  and  then  combines  them  using  weight-delay-and-sum.  However,  hardware  im¬ 
plementation  of  MIMO  triply  selective  fading  channels  still  presents  some  challenge 
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in  accurately  incorporating  correlation  functions  among  the  multiple  sub-channels. 
Currently,  many  so-called  MIMO  emulators  only  implement  multiple  uncorrelated 
flat  fading  channels  in  parallel.  For  example,  the  emulator  in  [24]  outputs  multiple 
uncorrelated  flat  Rayleigh  fading  channels  without  considering  inter-tap  and  spatial 
correlation;  the  emulator  in  [25]  attempts  to  incorporate  only  the  spatial  correlation 
matrices  into  multiple  frequency-flat  fading  waveforms.  To  the  best  of  our  knowledge, 
no  hardware-based  channel  emulators  in  the  literature  or  commercial  products  has 
properly  implemented  all  three  types  of  correlation  functions  of  MIMO  channels. 

In  this  paper,  we  propose  a  hardware  implementation  method  for  discrete¬ 
time  MIMO  triply  selective  channel  emulators.  The  proposed  method  implements 
the  three  types  of  correlation  functions  of  the  triply  selective  channel  based  on  the 
software  simulator  in  [5].  The  emulator  consists  of  five  major  functional  modules: 

random  number  generator  (RNG),  frequency- flat  Rayleigh  fading  generator  (FRFG), 

1 

inter-tap  correlation  matrix  and  matrix  square  root  calculation  module  {CjSI  gener¬ 
ator),  correlation  multiplier  (CM),  and  interpolator  module.  The  use  of  Kronecker 
product  in  the  CM  module  saves  a  large  amount  of  hardware  memory  storage  at 
the  expense  of  slightly  increased  computational  complexity.  This  method  achieves 
the  best  tradeoff  between  hardware  resources  and  simulation  speed.  In  addition, 
mixed  parallel-serial  architecture  is  used  to  meet  real-time  requirements  while  reduc¬ 
ing  hardware  area.  For  example,  the  SoS  computation  for  a  single  flat  Rayleigh  fading 
waveform  is  performed  in  parallel,  and  the  generation  of  multiple  flat  Rayleigh  fading 
waveforms  and  correlation  combination  are  performed  in  series.  The  proposed  emula¬ 
tor  is  implemented  on  a  Stratix  III  EP3SL150F  FPGA  DSP  development  kit.  Several 
fading  channel  examples  are  provided  to  demonstrate  the  accuracy  and  statistical 
properties  of  the  emulator. 
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The  rest  of  the  paper  is  organized  as  follows.  Section  2  reviews  the  software- 
based  model  of  discrete-time  MIMO  triply  selective  fading  channels.  Section  3  pro¬ 
poses  the  hardware  implementation  method  for  this  model.  Section  4  presents  several 
FPGA  hardware  implementation  examples  and  evaluates  their  performance.  Section 
5  draws  the  conclusion. 


2  DISCRETE-TIME  TRIPLY  SELECTIVE  FADING  MODEL 


We  choose  the  discrete-time  MIMO  triply  selective  fading  model  in  [5]  as  the 
basis  for  our  hardware  implementation.  Consider  a  MIMO  channel  with  P  transmit 
and  O  receive  antennas.  The  input-output  relationship  of  the  channel  in  the  discrete¬ 
time  domain  is  described  as 


Q2 

y(k)  =  H(M)  -x(fc  -q)  +  v(fc), 
q=-Q  1 


(1) 


where  the  superscript  (•)*  is  the  transpose  operator  of  a  matrix  or  vector,  x(fc)  = 
[x\ (k),  x2(k),  ...,  xp(k)Y  is  the  transmitted  signal  vector,  y  (k)  =  [yi(k),  y2(&)>  ■■■,yo(k)]t 
is  the  received  signal  vector,  v(k)  =  [vi(k) , v2(k) ,  ...,vo(k)Y  is  the  background  white 
Gaussian  noise,  k  is  the  time  index,  and  Q\  and  Q 2  are  nonnegative  integers  repre¬ 
senting  the  range  of  delay  taps  yielding  the  total  channel  length  of  Q  —  Qi  +  Q2  +  1. 
Note  we  assume  the  symbol  interval  is  Ts.  The  MIMO  channel  coefficient  matrix 
H {k,  q)  at  time  instant  k  and  delay  tap  q  is  defined  by 


H  (k,q) 


hi:i(k,q) 
y  ho,i{k,q) 


hi,p(k,q)  ^ 
ho,p(k,q)  j 


(2) 
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For  the  convenience  of  implementation,  we  reshape  the  matrix  H (k,  q)  to  an 
( OPQ )  x  1  coefficient  vector  as 

h vec(k)  =  [h1(i(A:),  ...,h1)P(fc)  |  ...  |  h0,i (fc),  ...,h0ip(fc)]* 

where  Yi0,p(k)  is  the  coefficient  vector  of  the  (o, p)-th  sub-channel  given  by  h 0,P(k)  = 
[h0,p(— Qi,  k),  ...,  h0,p(Q2,k)\.  Based  on  the  software  model  in  [5],  the  vector  h vec(k) 
is  computed  by 


h „,c(fc)  =  C|(0)  ■  #(*;)  =  (*L  ®  n,  ®  cjSJ)  ■  #(*)  (3) 

where  <g)  denotes  the  Kronecker  product  and  is  the  square  root  of  matrix  X 
such  that  X  =  X^  ■  (X.^)h  with  the  superscript  (-)h  being  the  Hermitian  operator. 
The  spatial  correlation  matrices,  and  T' tx ,  are  associated  with  the  receiver  and 
transmitter  antennas,  respectively;  C jsi  is  the  inter-tap  covariance  matrix  which 
causes  intersymbol  interference  (ISI);  and  <&(k)  is  an  (OPQ)  x  1  vector  for  each  time 
index  k. 

Dehne  &(k)  =  [Zi(k),  Z2(k), ...,  Z(0PQ)(k)]t,  where  Z^k)  is  one  of  the  uncorre¬ 
lated  flat  Rayleigh  fading  waveforms.  The  multiple  uncorrelated  flat  Rayleigh  fading 
waveforms  Zi(k)  can  be  efficiently  simulated  by  one  of  the  SoS  models  [12]  and  a 
typical  one  is 


Zi(k) 

Z»(k) 

Z,\k) 


a 


m,i 


Z*(k)  +jZSi(k ), 


M 


Y  cos(2tt (fdkTs  cos  am4  +  (f>m,i)), 


m=  1 
M 


—  Y  cos(2n(fdkTs  sinamii  + 

m=  1 

7T  (m  —  0.5  +  9i) 


2  M 


-,  m  —  1,  2,  •  •  •  ,  M. 


(4) 
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where  fd  is  the  maximum  Doppler  frequency,  M  is  the  total  number  of  sinusoids  and 
j  =  xf—\.  The  angle  of  arrival  arrhl  is  randomized  by  a  6C  The  random  variables 
(j)mti  and  are  the  random  phases  of  the  in-phase  and  quadrature  components, 
respectively.  The  random  variables  and  6l  are  statistically  independent 

and  uniformly  distributed  on  [—0.5,  0.5)  for  all  m. 

The  spatial  correlation  matrices,  \E and  d' px ,  are  usually  known  and  spec- 
ihed  by  users.  The  inter-tap  covariance  matrix,  C/57,  is  computed  according  to  the 
power  delay  profile  (PDP).  Define 


/ 


C/5/  = 


c(  Q 1,  Q 1) 


y  c(q2,—Qi) 


c(—Qu  Q2) 
C(Q-2,Q'2 )  j 


where  c(gi,g2)  is  determined  by 


(5) 


N 

c(qi,  q-2)  =  anRPTPR(q iTs  -  rn)R*pTpR(q2Ts  -  rn)  (6) 

n=l 

and  RpTpR(£,)  is  the  convolution  of  the  transmit  and  receive  filters.  Note  N  is  the 
number  of  total  resolvable  paths  in  the  PDP.  The  superscript  ()*  is  the  conjugate 
operator.  Parameters  crn  and  rn  come  from  the  PDPs,  C(r),  which  are  often  specified 
by  standards  [26]  as 

N 

G{t)  =YanHT-Tn) 

n=  1 


(7) 
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3  HARDWARE  IMPLEMENTATION  METHOD 


Our  hardware  implementation  of  the  discrete-time  MIMO  triply  selective  fad¬ 
ing  emulator  outputs  h vec(k)  for  O  x  P  sub-channels  in  parallel  with  Q  taps  per  sub¬ 
channel.  For  the  convenience  of  description,  we  give  elements  of  h vec(k)  new  indices 
by  defining  H(l ,  k)  =  h0yP(k,  q),  where  l  =  Q-[(o  —  l)-P  +  (p  —  l)]  +  (q  +  Qi  +  l).  There¬ 
fore,  the  channel  vector  h vec(k)  is  converted  to  [H(l,  k),  H(2,  k), ...,  H((OPQ),  k)Y, 
where  H(l,  k )  is  the  complex  fading  coefficient  with  H(l,  k )  =  Hc(l,  k )  +  jHs(l,  k). 

The  proposed  implementation  scheme  consists  of  five  major  modules:  a  ran- 

1 

dom  number  generator  (RNG),  a  flat  Rayleigh  fading  generator  (FRFG),  a  CjSI  gen¬ 
erator,  a  correlation  multiplier  (CM)  module,  and  an  interpolator  module,  as  shown 
in  Fig.  1.  Each  module  implements  one  or  more  equations  presented  in  Section  2. 


Figure  1.  Block  diagram  of  discrete-time  MIMO  triply  selective  fading  emulator. 


The  RNG  module  is  a  bank  of  pseudo-random  number  generators,  which  gen¬ 
erate  uniform  random  variables  used  in  the  SoS  channel  model  (3).  Note  that  a  set  of 
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the  random  variables  is  generated  at  the  beginning  of  each  trial  and  they  are  stored 
and  used  for  all  k  (time  index). 

The  FRFG  module  serially  generates  a  large  number  of  uncorrelated  flat 
Rayleigh  fading  waveforms  with  proper  temporal  correlation  (or  Doppler  spectrum) 
each  at  a  low  sampling  rate.  It  takes  the  random  variables  from  the  RNG  module  as 
inputs  and  computes  multiple  uncorrelated  flat  Rayleigh  fading  responses  according 
to  (3),  but  with  a  decimation  factor,  R ,  of  the  symbol  interval.  This  implementation 
takes  advantage  of  the  fact  that  the  maximal  Doppler  frequency  of  typical  fading  chan¬ 
nels  is  often  much  smaller  than  the  symbol  rate  and  fading  variation  within  channel 
coherence  time  is  small.  This  technique  reduces  the  computational  complexity  while 

preserving  the  accuracy  of  the  channel  waveforms. 

1 

The  CfSI  generator  computes  the  coefficients  of  the  inter-tap  correlation  ma¬ 
trix  using  (5)  and  the  square  root  of  C jsi  using  the  Jacobi  algorithm  for  SVD  [27]. 

Note  that  this  module  is  also  used  only  once  at  the  beginning  of  each  simulation  trial 

1 

and  the  square  root  matrix  CfSI  is  stored  and  used,  along  with  the  external  input, 
1  1 

C^x  and  CfiX.  for  the  entire  trial. 

The  CM  module  incorporates  the  three  square-root  matrices  with  the  multiple 
uncorrelated  flat  Rayleigh  fading  channel  responses  using  the  Kronecker  product  and 
vector  multiplication.  Note  that  the  three  square  root  matrices  are  saved  in  the 
on-chip  memory  of  the  FPGA  development  board. 

The  interpolator  module  linearly  interpolates  the  triply  selective  fading  chan¬ 
nel  waveforms  into  symbol-spaced  samples  with  an  interpolation  rate  R.  This  module 

makes  sure  the  emulator  meets  real-time  requirements. 

1 

The  CfSI  generator  and  CM  module  are  novel  FPGA  hardware  implemen¬ 
tations  proposed  by  this  paper.  Instead  of  storing  a  big  square  root  matrix  of  size 
( OPQ )  x  ( OPQ ),  this  architecture  stores  the  coefficients  of  three  small  matrices  of 
sizes  O  x  O,  P  x  P,  Q  x  Q,  respectively.  The  memory  savings  is  accomplished  by 
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slight  increase  of  computational  complexity  associated  with  the  Kronecker  product 

calculation.  The  other  three  modules  used  in  the  proposed  scheme  are  similar  to  the 

ones  in  [20,25,29]  with  slight  modifications.  The  following  subsections  will  describe 

1 

each  module  in  details,  with  emphasis  on  the  C  jSI  generator  and  CM  modules. 

3.1  Random  Number  Generator  and  Flat  Rayleigh  Fading  Generator 

The  RNG  and  FRFG  modules  work  together  to  generate  ( OPQ )  uncorrelated 
flat  Rayleigh  fading  channels  using  (3).  The  data  path  of  RNG  and  FRFG  mod¬ 
ules  is  shown  in  Fig.  2.  The  FRFG  module  has  a  parallel-serial  mixed  structure, 
which  generates  (2 M)  sinusoids  in  parallel  and  Zi(Rk)~Z(opQ)(Rk)  in  serial.  The 
parallel  structure  reduces  processing  time  of  computing  a  single  Zi(Rk)  to  jj  of  that 
required  by  a  serial  structure;  while  a  serial  structure  is  a  better  choice  for  generating 
Zi(Rk)~Z(oPQ)(Rk)  because  (OPQ)  is  a  large  and  reconhgurable  number.  The  serial 
structure  outputs  Zi(Rk)  ~  Z(0PQ)(Rk )  sequentially,  which  matches  the  requirement 
of  the  CM  module  needing  serial  inputs. 

The  RNG  module  generates  all  uniform  random  variables  including  c f)m,i , 
and  9i,  where  m  ranges  from  1  to  M  and  i  ranges  from  1  to  (OPQ).  It  consists  of 
a  set  of  (2 M  +  1)  RNGs:  (2 M)  of  them  are  for  <i>m^  and  (pm^,  and  one  for  9 j.  To 
provide  sufficient  accuracy  and  randomness,  we  employ  the  combined  linear  feedback 
shift  register  (LFSR)  random  number  generator  (RNG)  [29],  which  has  a  longer  reoc¬ 
currence  period,  better  randomness,  and  correlation  properties  than  the  conventional 
LFSR  RNGs.  In  our  implementation,  outputs  of  the  RNGs  are  scaled  and  shifted  to 
meet  the  range  requirement  before  storing  in  buffers  of  the  FRFG  module. 

The  FRFG  module  involves  a  large  set  of  cos  and  sin  functions,  whose  accu¬ 
racy  affects  the  performance  of  the  emulator  significantly.  We  propose  a  simplified 
but  accurate  look-up-table  (LLIT)  scheme  to  compute  cosamij  and  sin  amii.  Since 
9i  are  uniformly  distributed  on  [—0.5,  0.5),  it  can  be  proved  that  am^  are  uniformly 
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Figure  2.  Implementation  blocks  of  the  RNG  and  FRFG  modules. 


distributed  on  [7r^"~1') ,  We  build  a  set  of  (2 M)  LUTs  named  Ci,  Cb ,  ...  Cm, 
Si,  S2,  ...,  and  .Sm,  each  of  which  has  D\  non-overlap  entries.  The  entries  in  the 
LUT  Cm  are  cos(7r(-^1'>  :  2md1  :  while  the  entries  in  the  LUT  Sbi  are 

sin(7r^~1'>  :  2mDi  :  The  outputs  of  the  RNG  for  d*  are  rounded  and  scaled 

to  the  range  of  [l,T>i],  Taking  these  outputs  as  the  read  addresses,  the  set  of  LUTs 
output  the  desired  cos  am^  and  sin  ami  values.  This  LUT  scheme  achieves  a  very 
high  precision  which  is  equivalent  to  implementing  an  (M Lb ) -entry  LLIT  with  range 
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cos(O)  ~  cos(|)  or  sin(O)  ~  sin(|).  The  format  of  the  entries  in  the  LUTs  is  the  fixed 
point  Q2.14,  which  is  enough  to  meet  the  accuracy  requirement. 

In  the  FRFG  module,  a  generator  block  is  employed  to  output  ( fdRkTs ), 
where  k  is  generated  by  an  increase  counter  with  a  proper  updating  period,  and 
it  is  then  multiplied  by  the  constants  fd,R ,  and  Ts).  The  outputs  of  this  block 
are  multiplied  by  cosam;j  (or  sinami)  and  added  with  (f>mji  (or  to  obtain 

(fdRkTs  cos  amti  +  <f>m!i)  (or  (fdRkTs  sin  arrhl  +  (pm,i))-  A  set  of  modulo  operators 
extract  the  fractional  parts  of  the  inputs  and  convert  them  into  the  read  addresses 
of  the  (2 M)  LUTs  named  COSi,  COS2,  ...,  and  COS 2m-  These  LUTs  are  employed 
to  compute  cos(2n(fdRkTs  cos  am^  +  (f>m,i))  and  cos(2Tr(fdRkTssmamti  +  <pm,i)),  re¬ 
spectively.  All  these  LLITs  are  identical  with  D2  entries  cos(0  :  :  2w<'^)2~1'> ) ■  The 

outputs  of  these  LLITs  are  summed  by  two  accumulators,  and  multiplied  by  \J2/M  to 
obtain  ZCi(Rk)  and  ZSi(Rk ).  Taking  the  corresponding  parameters  of  the  ( OPQ )  sub¬ 
channels,  the  FRFG  module  is  re-used  to  generate  (OPQ)  uncorrelated  flat  Rayleigh 
fading  channels.  The  outputs  of  the  FRFG  module  are  sent  to  two  buffers  in  the  CM 
module. 

3.2  C jSI  Generator  Module 

1  1 

The  C]8I  generator  module  takes  the  PDPs  as  input  and  generate  CjSI.  It 

consists  of  two  submodules:  the  C isi  generator  module  which  computes  the  coeffi¬ 
cients  of  C isi  according  to  (4)  and  (5),  and  the  matrix  square  root  (MSR)  module 

1 

which  finds  the  matrix  square  root  of  C/57.  The  datapath  of  the  Cfsr  generator  is 
shown  in  Fig.  3.  The  q1  and  q2  counters  range  from  —Qi  to  Q2  (assuming  Qi  <  Q 2). 
The  qi  counter  increases  by  one  in  every  (NQ)  basic  clock  periods  (BCP);  while  the 
q2  counter  increases  by  one  in  every  N  BCPs.  Two  buffers  store  values  of  and  rn 
and  sequentially  output  them  while  n  increases  from  0  to  N.  The  outputs  of  the  two 
buffers  are  repeated  sequences  with  a  period  of  N  BCPs.  Computations  of  RpTpR(f ) 
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1 

Figure  3.  Implementation  blocks  of  the  Cf5J  generator  module. 


and  R*PtPr(0  are  implemented  using  a  LUT  scheme.  Since  RpTpR{ £)  is  a  real  and 
even  function,  the  size  of  the  LUT  can  be  reduced  by  only  storing  the  values  cor¬ 
responding  to  £  >  0.  In  onr  implementation,  the  LLITs  Ri  and  i?2  have  the  same 
D3  entries:  the  results  of  RpTpR(£)  where  £  =  (0  :  ^l-i  •  +  MAX(t„)).  The 

read  addresses  of  R\  and  i?2  are  computed  from  \p\Ts  —  rn\  and  \q2Ts  —  rn |,  and  the 
outputs  are  multiplied  together  and  then  by  the  corresponding  cU.  The  accumulator 
following  the  two  multipliers  sums  the  N  inputs  to  obtain  one  coefficient  c(q3,q2)  in 
every  N  BCPs.  The  C/5/  generator  module  sequentially  outputs  the  coefficients  in  a 
row- wise  order. 

The  MSR  module  employs  the  Eigenvalue  Decomposition  (EVD)  method  [27] 

to  find  the  matrix  square  root  of  C/5/.  According  to  (4)  and  (5),  C/5/  is  always 

a  symmetric  positive  definite  matrix,  whose  eigenvalues  are  equal  to  its  singular 

values.  Decomposing  C/5/  into  two  matrices:  VcJS/  and  D cISI,  we  have  C/5/  = 

VcJSJDcJS/V^s/,  where  DpfSf  is  diagonal  with  its  diagonal  elements  being  the  eigen- 

1  1 

values  of  C/5/.  The  square  root  matrix  is  then  computed  as  CfS]  =  Vc^jD^,  . 

The  MSR  module  is  implemented  by  an  EVD  submodule  and  two  matrix  multipliers. 
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The  coefficients  of  C isi  are  sequentially  fed  via  a  buffer  to  the  EVD  module  and  the 

eigenvalues  are  computed  by  the  Jacobi  rotation  algorithm  [27]  with  implementation 

details  found  in  [30,31].  The  outputs  of  the  EVD  module,  Vc/S/  and  ,  are  stored 

in  separate  buffers  while  Dc/SJ  sequentially  pass  through  a  square  root  calculator  to 
1  11 
yield  D £  .  The  coefficients  ofVc,SJDp  Vp|  ,  or  C fSI,  are  computed  using  two 

matrix  multipliers  and  are  output  sequentially  in  a  row-wise  order. 


3.3  Correlation  Multiplier  Module 

The  CM  module  incorporates  inter-tap  and  spatial  correlation  matrices  into 
the  multiple  uncorrelated  flat  Rayleigh  fading  waveforms  generated  by  the  FRFG.  It 
consists  of  two  submodules:  the  Kronecker  product  (KP)  module,  which  computes 

I  III 

(0)  =  'E ftx G ''E ^ C C fs i ,  and  the  vector  multiplier  (VM)  module,  which  implements 
1  1 
C^(0)  •  &(k).  Although  the  coefficients  of  C^(0)  are  fixed  values  which  can  be  pre¬ 
computed  by  software  and  stored  in  hardware  memory,  the  pre-compute  and  store 

1 

method  consumes  a  large  amount  of  hardware  memory  for  (0)  storage,  especially, 

when  (OPQ)  is  large.  Our  proposed  method  employs  the  KP  module  to  compute 
1 

(0)  in  real-time  and  its  results  are  input  to  the  next  module  without  storing.  The 
datapath  of  the  CM  module  is  shown  in  Fig.  4. 

The  KP  module  consists  of  three  random  access  memories  (RAMs),  six  coun¬ 
ters  and  a  few  multipliers  and  adders.  The  RAM  A,  RAM  B,  and  RAM  C  store  the 

11  1 

coefficients  of  the  three  small  matrices,  ^%X(0  x  O ),  \I/f ,X(P  x  P ),  and  C jSI{Q  x  Q), 

in  a  row-wise  order.  The  counters,  multipliers,  and  adders  work  together  to  generate 

the  proper  read  addresses  for  the  three  RAMs.  Counters  have  different  clock  periods 

(integer  multiples  of  one  BCP)  and  modulo  operators,  O,  P,  and  Q.  Two  multipliers 

are  employed  to  multiply  outputs  of  the  three  RAMs  together.  Their  results  are  the 

1 

coefficients  of  the  matrix  (0)  in  a  row-wise  order. 
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In  the  VM  module,  two  buffers  storing  ZCi(Rk )  and  ZSi(Rk )  are  output  in 

1 

a  proper  order  to  multiply  the  corresponding  coefficients  of  C£(0).  Take  the  buffer 

storing  Zc.(Rk )  for  example,  it  repeatedly  outputs  the  sequence,  Zcl(Rk),  ZC2(Rk),  ..., 

1 

Zc{OPQ)(Rk),  for  ( OPQ )  times  for  each  time  index  k.  The  timing  of  the  (0)  module 
is  aligned  with  the  outputs  of  the  two  buffers  to  ensure  correct  multiplication.  The 
accumulators  sum  the  (OPQ)  results  in  every  (OPQ)  BCPs  and  the  final  results  are 
down-sampled  by  the  same  rate  to  yield  Hc(l,Rk)  or  Hs(l,Rk).  Therefore,  for  each 
time  index  k,  it  takes  (OPQ)  BCPs  to  output  one  single  Hc(l,  Rk)  and  Hs(l,  Rk),  and 
(OPQ)2  BCPs  to  output  all  results  of  Hc(l,  Rk)  and  Hs(l,  Rk)  for  l  =  1,  •  •  •  ,  OPQ. 

3.4  Interpolator  Module 

The  interpolator  module  performs  a  linear  interpolation  with  a  rate  R  to  meet 
the  real-time  requirements.  The  datapath  of  the  interpolator  module  is  shown  in 
Fig.  5.  The  inputs  from  the  CM  module,  Hc(l,Rk)  and  Hs(l,Rk),  are  delayed  by 
(OPQ)2  BCPs  and  become  Hc(l,  R(k—  1))  and  Hs(l,  R(k  —  1)),  respectively.  A  enable 
control  block  holds  “HIGH”  for  R  BCPs  and  then  changes  to  “LOW”  for  (OQP  —  R) 
BCPs.  Therefore,  the  output  of  the  counter  increases  from  0  to  (R  —  1)  in  the  first 
R  BCPs  and  holds  (R  —  1)  in  the  rest  of  (OPQ  —  R)  BCPs  in  every  (OPQ)  BCPs. 
The  counter  output  is  multiplied  with  1  / R  as  well  as  Hc(l,  Rk)  —  Hc(l ,  R(k  —  1))  and 
Hs(l,Rk)  —  Hs(l,R(k  —  1)),  respectively.  The  results  are  added  to  Hc(l,R(k  —  1)) 
and  Hs(l,  R(k  —  1)),  respectively,  to  obtain  Hc(l,  k)  and  Hs(l ,  k)  as  the  final  outputs. 


4  EXAMPLES  AND  PERFORMANCE  EVALUATION 


The  proposed  discrete-time  MIMO  triply  selective  fading  channel  emulator  was 
implemented  on  an  Altera  Stratix  III  EP3SL150F1152C2N  FPGA/DSP  development 
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Figure  4.  Implementation  of  the  correlation  multiplier  module. 


kit.  The  basic  clock  frequency  of  the  FPGA  chip  was  Fcu t  =  50  MHz,  which  provided 
BCP=20  ns.  We  used  Quartus  II  version  8.0,  DSP  Builder  version  5.0,  and  Matlab 
Simulink  for  this  development  and  hardware- in- the- loop  (HIL)  test. 

1 

4.1  C jSI  Generator  Performance  Evaluation 

1 

As  an  example,  the  C2ISI  generator  module  was  implemented  to  compute  the 
coefficients  for  the  typical  urban  channel  model  with  a  12-tap  PDP,  as  shown  in  Fig.6. 
This  is  a  commonly  used  channel  model  specified  in  the  3GPP  standard  [26,  pg.70]. 
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Figure  5.  Implementation  of  the  interpolator  module. 


The  transmit  and  receive  filters  were  the  square-root  raised  cosine  (SRC)  filters  with 

a  roll-off  factor  0.3  and  group  delay  3 Ts,  where  Ts  =  3.69  /i.s.  We  configured  Q i  =  4 

1  1 

and  Q-2  =  5,  therefore  the  size  of  CjSI  was  10  x  10.  The  coefficients  of  CfSI  with  three 

1 

fixed-point  formats  (Q2.6,  Q2.10,  and  Q2.14)  were  generated  by  the  C  2ISI  generator 
and  compared  to  those  of  Matlab  floating-point  computation.  Their  squared  error 
per  coefficient  and  mean  squared  errors  (MSE)  are  shown  in  Fig. 7,  where  the  x-axis 
is  the  coefficient  index.  All  the  three  fixed-point  formats  had  small  MSEs  less  than 
—30  dB.  The  outputs  with  Q2.10  and  Q2.14  resulted  in  similar  MSEs,  but  the  former 
consumed  less  hardware  resource.  Therefore,  the  Q2.10  format  was  selected  as  a  good 
tradeoff  between  performance  and  cost. 


4.2  KP  Module  Memory  Usage  Evaluation 

The  proposed  KP  module  can  save  a  large  amount  of  memory  by  computing 

1 

Cl  (0)  in  real-time  without  storing.  Instead  of  pre-computing  and  storing  all  coef- 
1  1 
hcients  of  C^(0),  the  proposed  KP  method  only  stores  three  small  matrices,  4/  2Rx , 
1  1_ 

vI'^x,  and  C|5J,  and  computes  the  Kronecker  product  using  a  serial  structure  with 
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Figure  6.  The  normalized  PDPs  for  the  typical  urban  channel  model  presented  in 
3GPP  standard  [26]. 
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Figure  7.  The  squared  error  and  MSE  comparison  of  coefficients  of  C  fSI  between 
hardware  fixed-point  and  Matlab  floating-point.  Note  that  the  scales  of  y  axes  are 
different  in  sub-figures. 
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five  multipliers,  six  adders  and  six  counters.  The  memory  usage  of  the  pre-compute 
and  store  method  is  ( OPQ )2  words,  a  quadratic  function  of  the  matrix  size;  while  the 
proposed  KP  method  requires  only  (02  +  P2  +  Q2)  words.  The  comparison  for  typical 
values  of  O,  P,  and  Q  is  shown  in  Fig.  8  which  clearly  demonstrates  that  the  mem¬ 
ory  required  by  the  pre-compute  and  store  method  becomes  prohibitively  high  when 
the  matrix  size  (OPQ)  is  large,  making  the  method  impractical.  A  large  amount  of 
hardware  memory  can  be  saved  by  the  proposed  KP  method  with  only  slight  increase 
in  hardware  resource  usage,  making  it  a  better  choice  for  triply-selective  channel 
emulation. 

4.3  Frequency  Selective  Fading  Channel  Example 

A  doubly-selective  channel  emulator  was  implemented  using  the  proposed 
method  by  configuring  ^rx  and  'S’tx  to  identity  matrices  thus  simultaneously  gener¬ 
ates  (O  x  P)  doubly-selective  channels.  The  implementation  parameters  were  selected 
as  M— 16  (number  of  sinusoids),  Ts= 3.69  (is  (symbol  duration),  fdTs= 0.001  (normal¬ 
ized  Doppler),  Q=10  (channel  taps),  and  i?=140  (interpolation  rate).  The  PDPs  and 
transmit /receive  filters  were  the  same  as  those  in  Section  4.1. 

The  performance  of  this  emulator  was  evaluated  by  the  auto/cross-correlation 
of  its  output  waveforms,  as  shown  in  Fig.  9.  The  theoretical  autocorrelation  function 
of  hiti(k,  l )  is  given  by  c(l,  /)•  Jq[2ti  fd(k\  —  k2)Ts];  while  the  theoretical  cross-correlation 
function  of  h\^(k,l\)  and  hiti2(k,l)  is  given  by  c(/i,/2)  •  Jo[2irfd(ki  —  k2)Ts\,  where 
the  correlation  coefficients  were  c(0,  0)=0.7794,  c(0, 1)=0.1551,  and  c(0,  2)=— 0.0544. 
The  auto/cross-correlation  functions  of  the  emulator  outputs  were  computed  offline 
using  50  trials  with  2.8  x  104  samples  per  sub-channel  per  trial.  As  can  be  seen,  the 
results  of  hardware  outputs  closely  matched  theoretical  ones. 
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Figure  8.  The  comparison  of  memory  usage  between  the  pre-compute  and  store 
method  and  the  proposed  KP  method. 


4.4  Triply  Selective  Fading  Channel  Example 

The  proposed  emulator  also  implemented  the  MIMO  triply  selective  channel 
example  presented  in  [5].  The  size  of  4/^,  T'r*,  arid  C jsi  were  O  =  P  =  2,  and 
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Figure  9.  Performance  of  Doubly  selective  fading  channel  emulator.  Auto-correlation 
of  hltl{k,0),  cross-correlation  between  /iiji(/c,  0)  and  1)  and  between  hiti(k,  0) 

and  2).  The  results  were  based  on  hardware  outputs  of  50  trials  with  2.8  x  104 

samples  per  sub-channel  per  trial. 


<5=4,  respectively,  with  the  spatial  correlation  matrices  given  as 


Tx 


'P  Rx 


(  1.0000  0.2154  ^ 
0.2154  1.0000 

(  1.0000  -0.3042  N 

^  -0.3042  1.0000  } 


The  PDP  was  G(r)  =  Aexp(—r//us)  for  0  <  r  <  5 /us  and  zero  elsewhere.  The 
transmit  filter  was  a  linearized  Gaussian  filter  with  a  time-bandwidth  product  equal 
to  0.3,  the  receive  filter  was  an  SRRC  filter  with  a  roll-off  factor  0.3.  The  matrix 
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C isi  with  q\  =  —1,  0, 1,  2  and  q2  =  —1,  0, 1,  2  were  computed  by  (5)  yielding 


0.0091 

0.0426 

0.0178 

-0.0016 

0.0426 

0.3664 

0.3407 

0.0367 

0.0178 

0.3407 

0.5583 

0.1414 

-0.0016 

0.0367 

0.1414 

0.0602 

The  parameters,  M,  fd ,  TS1  and  R,  were  the  same  as  those  used  in  the  doubly  selective 
fading  example.  The  auto/cross-correlation  of  several  sub-channels  was  computed  in 
comparison  to  the  theoretical  ones  and  software  simulation  results  reported  in  [5],  as 
depicted  in  Fig.  10  and  Fig.  11.  The  theoretical  auto/cross-correlations  are  given  by 
c0(M2)Jo[27r/d(fci  —  k2)Ts],  where  Co(h,l2 )  is  the  (/i,/2)-th  coefficient  of  Cfc(0).  The 
results  of  the  emulator  outputs  were  also  based  on  50  trials  and  they  matched  the 
theoretical  ones  very  well. 

4.5  Evaluation  of  Flat  Rayleigh  Fading  Generators 

The  performance  of  the  flat  Rayleigh  fading  generator  was  analyzed  by  statis¬ 
tical  properties  of  FRFG  outputs.  The  FRFG  module  had  the  same  parameters  M, 
Ts,  fd,  and  R  as  those  of  the  previous  examples.  The  probability  density  function 
(PDF)  of  the  real/imaginary  part  of  the  outputs,  the  PDF  of  the  envelop,  and  the 
level  crosing  rate  (LCR)  are  compared  with  the  theoretical  ones,  as  shown  in  Fig.  12- 
14.  The  PDF  curves  of  the  hardware  outputs  matched  the  theoretical  ones  very  well. 
The  LCR  of  the  emulator  outputs  had  slightly  lower  values  than  the  theoretical  ones 
at  lower  rates  because  the  number  of  simulated  samples  was  limited  to  provide  an 


Normalized  Time  Laq:  kf  T 

a  d  s 


Figure  10.  Performance  of  the  triply  selective  channel  emulator.  Auto-correlation 
of  1),  cross-correlation  between  /iiji(/c,  0)  and  1),  and  between  /ii,i (A:,  0) 

and  h-2,i(k,  1).  The  numbers  of  trials  and  samples  are  the  same  as  those  in  Fig.  9. 


Normalized  Time  Laq:  kf  T 
a  d  s 


Figure  11.  Performance  of  triply  selective  channel  emulator.  Cross-correlation  be¬ 
tween  hiti(k,  —  1)  and  hiti(k,  1),  and  between  h\^{kx  —  1)  and  hiti(k,2).  Note  the 
change  of  scale  in  y-axis. 
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accurate  LCR  count.  These  results  indicated  that  the  hardware  implementation  of 
the  FRFG  module  had  good  accuracy. 

4.6  Parameter  Specifications  and  Hardware  Usage 

The  proposed  MIMO  triply-selective  fading  emulator  is  flexible  in  parameter 
selection  and  can  be  customized  to  simulate  channel  scenarios  other  than  the  exam¬ 
ples  presented  here.  Table  1  shows  the  parameter  ranges  of  the  emulator  with  the 
chip  clock  Fcik  =  50  MHz  and  symbol  duration  Ts  =  3.69  /js.  The  emulator  can 
generate  triply-selective  channels  with  any  PDPs  specified  in  [26].  For  systems  with 
smaller  symbol  durations,  the  real-time  requirement  can  be  met  by  increasing  the 
clock  frequency  Fcik  or  the  interpolation  rate  R  such  that  {OPQ)2  /  {FcikR)  <  Ts. 
This  ensures  that  the  number  of  output  coefficients  is  OPQ /Ts  complex  samples  per 
second.  The  normalized  Doppler  Tsfd  covers  most  of  practical  channel  scenarios, 
which  is  often  on  the  order  of  10-6  to  ICR2.  The  product  {OPQ)  is  also  limited 
by  Fcik  in  this  case.  Increasing  F^  to  200  MHz  and  keeping  Ts  and  R  unchanged 
lead  to  rna x{OPQ)  =  320  and  the  on-chip  memory  is  adequate  for  channels  with 
O2  +  P2  +  Q2  <  104.  The  proposed  emulator  stores  the  value  of  Tsfd  in  the  Q1.19 
format  to  ensure  high  accuracy.  Each  output  sample  is  a  complex  value  whose  real 
and  imaginary  parts,  Hc(l ,  k )  and  Hs(l,  k ),  are  represented  by  the  Q4.14  format.  This 
is  sufficient  to  avoid  overflow  and  to  provide  the  accuracy  of  10”4.  The  hardware 

Table  1.  Parameter  ranges  of  the  proposed  emulator  with  Fc^.=50  MHz  and  Ts  = 
3.69  jds.  _ 


Number  of  Rx, 
Tx,  and  Taps 

Normalized 
Doppler  Tsfd 

Complex 

Samples/s 

Output 

Accuracy 

{OPQ)  <  160 

1/219  ~  1 

OPQ/Ts 

10-4 

usage  of  the  MIMO  triply-selective  fading  emulator  with  O  =  P  =  4  and  0  =  10  is 
summarized  in  Table  2,  where  ALUT  denotes  adaptive  look-up  table,  DLR  denotes 
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the  dedicated  logic  register,  BM  denotes  block  memory,  and  DSP  means  the  DSP 

blocks  (high-speed  18-bit  multipliers).  The  percentage  uses  of  the  ALUT,  DLR,  and 

BM  were  roughly  one  third  of  the  total  hardware  resources  of  the  Stratix  111  FPGA 

chip,  and  the  percentage  use  of  the  DSP  multipliers  is  about  half  of  the  total  resource. 

1 

It  is  noted  that  the  CfSI  Generator  consumes  the  most  hardware  resources.  If  the 
C isi  coefficients  and  matrix  square  root  calculation  were  done  externally  by  software, 
the  percentage  of  ALUT,  DLR,  BM,  and  DSP  would  drop  dramatically  to  11%,  2%, 
13%,  and  13%  of  the  total  resources,  respectively. 


Table  2.  Resource  usage  of  the  MIMO  triply  selective  fading  emulator  on  Stratix  III 
EP3SL150F1152C2N  FPGA  with  Fcp.= 50  Mhz. _ 


Module 

ALUT 

DLR 

BM  bits 

DSP 

1 

C jSI  Generator 

22636 

36586 

1194944 

143 

RNG &  FRFG 

11988 

231 

618743 

16 

CM 

648 

1111 

10304 

10 

Other 

120 

429 

96098 

25 

Total 

percentage 

35392 

(31%) 

38357 

(34%) 

1920089 

(34%) 

194 

(51%) 

The  proposed  emulator  is  also  compared  to  other  four  related  emulators  of 

[21,23,24,25],  as  shown  in  Table  3,  where  LE  denotes  logic  elements  in  Altera  FPGA, 

and  LG  denotes  logic  cell  in  Xilinx  FPGA.  The  LE  count  is  converted  from  ALLIT 

by  ALLIT?sl.25LE  [32],  Although  LE  and  LG  are  different,  one  LE  is  considered 

equivalent  to  approximately  one  LG.  Even  though  it  requires  slightly  more  hardware 

usage,  the  proposed  emulator  implements  significantly  more  functionalities  than  other 

1 

emulators,  including  the  on-chip  Cfsr  calculation  and  the  CM  module  incorporating 


three  correlation  matrices  into  the  MIMO  fading  waveforms. 
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Table  3.  Performance  comparisons  of  related  Rayleigh  fading  channel  emulators. 


Proposed 

Emulator 

Emulator 
in  [21] 

Emulator 
in  [23] 

Emulator 
in  [24] 

Emulator 
in  [25] 

Logic  Unit 

44240 

(LE) 

3557  (LC) 

19630 

(LE) 

22272 

(LE) 

46357 

(LC) 

Block  Memory 

1920089 

Unknown 

822484 

Unknown 

440960 

DSP  Element 

194 

Unknown 

Unknown 

Unknown 

136 

RxxTx 

OxP  1 

lxl 

lxl 

4x4 

4x4 

Number  of  Taps 

Q  1 

3 

6 

9 

Unknown 

1 

On-chip  Cfsr  Calculator 

Yes 

No 

No 

No 

No 

Temporal  Correlation 

SoS 

Spectrum 

filtering 

SoS 

Spectrum 

filtering 

SoS 

Inter-tap  Correlation 

Yes  2 

Yes  3 

Yes  4 

No 

Unclear 

Spatial  Correlation 

Yes 

No 

No 

No 

Unclear 

Note:  1.  The  numbers  of  Rx,  Tx,  and  taps  meet  the  relationship:  (OPQ)<  160.  2.  The 

1_ 

inter-tap  correlation  matrix  CfSI  is  calculated  on  chip.  3.  The  inter-tap  correlation  is  imple¬ 
mented  by  upsampling  to  pass  band.  4.  The  inter-tap  correlation  is  implemented  using  baseband 
upsampling. 
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Figure  12.  PDF  of  Zc.  and  Zs..  The  numbers  of  trials  and  samples  are  the  same  as 
the  previous  figure. 
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Figure  13.  PDF  of  |Zj|,  where  Z%  =  Zc.  +  jZSi.  The  numbers  of  trials  and  samples 
are  the  same  as  the  previous  figure. 
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Figure  14.  LCR  of  \Zi\.  The  numbers  of  trials  and  samples  are  the  same  to  the 
previous  figure. 


5  CONCLUSIONS 


A  hardware  implementation  scheme  has  been  proposed  for  discrete-time  MIMO 
triply  selective  fading  emulators  which  utilizes  a  mixed  parallel-serial  structure  to 
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achieve  the  best  tradeoff  of  hardware  usage  and  output  speed.  The  proposed  method 
is  capable  of  simulating  MIMO  triply  selective  fading  channels  by  combining  the  inter- 
tap  and  spatial  correlation  matrices  with  uncorrelated  flat  Rayleigh  fading  waveforms. 
The  proposed  emulator  has  been  implemented  on  an  Altera’s  Startix  III  FPGA  devel¬ 
opment  kit  and  meet  real-time  requirement.  The  hardware  outputs  exhibit  accurate 
correlation  properties  closely  matching  the  theoretical  results. 
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PAPER 

III.  HARDWARE  IMPLEMENTATION  OF  TRIPLY  SELECTIVE 
RAYLEIGH  FADING  CHANNEL  SIMULATORS 

Fei  Ren  and  Yahong  Rosa  Zheng 

Abstract-  In  this  paper,  we  implement  a  real-time  hardware  triply  selective  Rayleigh 
fading  simulator.  This  simulator  incorporates  the  inter-tap  and  spatial  correlation 
matrices  into  multiple  uncorrelated  frequency-flat  Rayleigh  fading  waveforms  (in¬ 
cluding  temporal  correlation)  to  simulate  a  multiple-input  multiple-output  (MIMO) 
triply  selective  Rayleigh  fading  channel.  In  the  correlation  incorporation  procedure, 
this  simulator  uses  a  Kronecker  product  method  to  save  a  large  amount  of  hardware 
memories.  Occupying  34%  hardware  resources  of  one  Stratix  III  FPGA  chip,  this 
simulator  can  simulate  4x4  MIMO  fading  channels  with  10  correlated  delay-taps 
per  subchannel  in  real-time  for  a  symbol  rate  of  3.69  /is.  Accuracy  of  this  simula¬ 
tor  is  proved  by  comparing  the  statistical  properties  of  its  outputs  to  corresponding 
theoretical  values,  and  they  match  perfectly. 


1  INTRODUCTION 


Wireless  fading  channel  modeling  and  simulation  are  very  useful  for  testing 
and  verifying  communication  algorithm  design,  transceiver  products,  and  channel 
capacity  analysis.  However,  the  software  simulators  based  on  general  purpose  proces¬ 
sors  are  slow  and  difficult  to  meet  a  real-time  simulation  requirement.  The  hardware 
simulator,  which  is  based  on  low-cost  FPGA  and  DSP  chips,  is  a  preferred  solution 
for  the  real-time  fading  channel  simulation. 
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One  significant  statistical  property  for  wireless  fading  channel  models  is  the 
correlation  of  fading  channels  waveforms.  The  subchannels  of  a  MIMO  Rayleigh  fad¬ 
ing  channel  are  time-selective  (described  by  temporal  correlation),  frequency-selective 
(exhibiting  inter-tap  correlation),  and  space-selective  (associated  with  spatial  corre¬ 
lation  of  transmitters  and  receivers).  This  is  referred  to  as  the  triply  selective  fading 
channel  containing  three  types  of  correlations. 

A  discrete-time  MIMO  triply  selective  Rayleigh  fading  channel  model  and  soft¬ 
ware  simulation  are  proposed  by  [1].  But  hardware  implementation  of  MIMO  triply 
selective  simulators  presents  some  challenges  in  accurately  computing  and  incorpo¬ 
rating  three  types  of  correlations  into  the  discrete-time  model.  Current  reported 
hardware  MIMO  simulators  do  not  implement  all  three  types  of  correlations  and  may 
result  in  inaccurate  channel  characteristics.  For  example,  the  simulator  in  [2]  outputs 
multiple  uncorrelated  frequency-flat  Rayleigh  fading  waveforms  as  MIMO  subchan¬ 
nels;  while  another  simulator  in  [3]  attempts  to  incorporate  the  inter-tap  and  spatial 
correlation  matrices  into  multiple  frequency-flat  Rayleigh  fading  waveforms. 

In  this  paper,  we  propose  a  hardware  implementation  method  for  the  discrete¬ 
time  MIMO  triply  selective  fading  simulator  on  a  Stratix  III  FPGA  DSP  development 
kit.  This  simulator  implements  all  three  types  of  correlations  of  triply  selective  chan¬ 
nels.  The  frequency-flat  Rayleigh  fading  waveforms  with  temporal  correlation  or 
Doppler  spectrum  are  generated  using  a  Sum-of-Sinusoids  (SOS)  method.  The  inter- 
tap  correlation  matrix  associated  with  multipath  delay  spread  is  computed  according 
to  a  channel  power  delay  profile  (PDP)  and  transmit  /receive  filters.  The  spatial  cor¬ 
relation  matrices,  including  the  transmit  correlation  and  receive  correlation  matrices, 
are  pre-defined  inputs  associated  with  antenna  arrangements.  The  matrix  square 
roots  of  correlation  matrices  are  calculated  using  an  eigenvalue  decomposition  (EVD) 
method.  Then  they  are  combined  with  multiple  uncorrelated  frequency-flat  Rayleigh 
fading  waveforms  using  the  Kronecker  product  and  vector  multiplicity.  The  results  of 


the  Kronecker  product  are  computed  in  real-time  for  saving  hardware  memory.  Sta¬ 
tistical  properties  of  simulator  outputs  are  analyzed  and  compared  to  corresponding 
theoretical  ones  for  performance  evaluation. 


2  DISCRETE-TIME  MIMO  TRIPLY  SELECTIVE  RAYLEIGH 

FADING  MODEL 


With  accurate  statistical  properties  and  computational  efficiency  for  hardware 
implementation,  the  discrete-time  MIMO  triply  selective  fading  model  in  [1]  is  chosen 
as  the  basis  of  our  hardware  implementation.  In  [1],  the  MIMO  channel  matrix  at 
time  instant  k  and  delay  tap  q  can  be  represented  as  an  ( OPQ )  x  1  coefficient  vector 
h vec(k),  which  is  defined  as 

h vec{k)  =  [hij(/c),  ...,hi,p(fe)  I  ...  I  ho,i(&;), ...,  ho;p(fc)]*  (1) 

where  P  and  O  are  the  numbers  of  transmit  and  receive  antennas,  respectively  (Note  we 
assume  the  sampling  interval  being  Ts ).  The  vector  hop(k)  is  the  (o,p)-th  subchannel  FIR 
coefficient  vector  at  time  instant  k,  which  is  given  by 


h0,p(&)  —  [Ao,p(  Qh  &)j  •  ••>  h0,p(q,  &), ...,  h0p(Q2i  &)] 


(2) 


where  Q\  and  Q-2  are  nonnegative  integers  representing  the  range  of  q,  and  Q  =  Q1  +  Q2  +  L 
In  simulation,  the  vector  h vec(k)  can  be  simulated  by 

h vec(k)  =  C\  (0)  ■*(*:)  =  (*L  ®  ®  C]SI )  •  *(k)  (3) 

where  <8>  denotes  the  Kronecker  product;  X2  is  the  square  root  of  matrix  X=Xa  •  (X2)h; 
the  matrices  Vrx  and  the  matrix  ^Tx  are  the  spatial  correlation  matrices  determined 
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by  the  transmit  and  receive  antennas,  respectively;  C jsi  is  the  inter-tap  correlation  ma¬ 
trix;  the  vector  is  an  ( OPQ )  x  1  vector  and  <fr(k)=[Z i(k),  Z2(k), ...,  Z^qpq)  (A;)]*,  where 
Zi(k)=Zci(k)+jZsi(k)  is  one  of  multiple  uncorrelated  frequency-flat  Rayleigh  fading  wave¬ 
forms.  Each  frequency- flat  Rayleigh  fading  waveform  Zi(k)  can  be  efficiently  simulated  by 
the  SOS  method  proposed  in  [4]. 

For  the  proposed  simulator,  the  square  roots  of  spatial  correlation  matrices  'if Rx 
and  'S’tx  are  specified  by  users.  The  inter-tap  correlation  matrix  Cjsi  is  denoted  as 


Cisi 


*  c(—Qi,  —Qi) 
y  c(Q2,—Qi) 


c(—Qi,Q2) 

c(Q2,Q2 )  ) 


where  its  coefficients  c(q\ .  q2)  can  be  calculated  by 


(4) 


N 

c(qi,q2)  =  ^2  cjIRptpMiTs  ~  Tn)R*pTpR(q2Ts  -  rn)  (5) 

n=  1 

where  RpTpR(£t)  is  the  convolution  function  of  the  transmit  filter  and  receiver  filter;  N  is 
the  number  of  total  resolvable  paths  in  PDPs;  *  is  the  conjugate  operator.  Parameters  an 
and  rn  are  determined  by  the  discrete-time  PDPs,  G(r)=Y^n=i  an^{T  ~  Tn),  which  are  often 
specified  by  communication  standards  like  [5]. 


3  HARDWARE  IMPLEMENTATION  METHOD 


Our  proposed  hardware  simulator  can  output  h vec(k)  in  real-time.  For  the  conve¬ 
nience  of  description,  we  give  new  indices  elements  of  h vec(k), 


H(l,  k)  —  hQp(q,  k ) 


(6) 
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where  l  =  Q  ■  [(o  —  1)  •  P  +  (p  —  1)]  +  (q  +  Q\  +  1).  Therefore,  the  vector  h vec(k)  can  be 
described  as 

h vec(k)  =  [H(  1,  k),  H( 2,  k), ...,  H((OPQ),  k)Y  (7) 

where  H(l,k)  is  a  complex  fading  coefficient  and  H(l,k)=Hc(l,k)  +  jHs(l,k). 

The  proposed  hardware  simulator  consists  of  four  major  modules,  as  shown  in 
Fig.  1.  The  flat  Rayleigh  fading  generator  (FRFG)  module  generates  multiple  uncorre¬ 
lated  frequency- flat  Rayleigh  fading  waveforms  Zj(Rk).  which  have  a  decimation  rate  R. 

i 

The  CfSI  generator  module  computes  the  inter-tap  correlation  matrix  and  its  square  root. 
The  correlation  multiplier  (CM)  module  implements  the  Kronecker  product  and  vector  mul¬ 
tiplicity  in  hardware  to  perform  (3).  The  interpolator  module  linearly  interpolates  samples 
with  an  interpolation  rate  R  to  increase  the  output  speed  for  meeting  real-time  requirement. 

i 

Among  the  four  modules,  the  C|5/  generator  and  CM  module  are  novel  hardware 
implementations  proposed  by  our  paper.  The  FRFG  and  interpolator  module  are  similar 
to  the  SOS  simulator  in  [3]  and  will  not  be  described  here. 

i 

The  C jSI  generator  module  consists  of  two  submodules:  the  C/57  generator  module 

that  computes  the  coefficients  of  C/5/  according  to  (4)  and  (5),  and  the  matrix  square  root 

1 

(MSR)  module  that  calculates  the  square  root  of  C/5/.  The  datapath  of  the  C  jSI  generator 
module  is  shown  in  Fig.  2.  The  Counters  q\  and  q-2  are  up  counters  with  same  output 
ranges  from  —Q\  to  Q2 ■  The  Counter  q\  increases  by  one  in  every  ( NQ )  basic  clock  periods 
(BCP);  while  the  Counter  q2  increases  by  one  in  every  N  BCPs.  Two  buffers  store  a \  and 
rn,  respectively,  and  sequentially  output  them.  The  results  of  RpTpR( £)  and  R*pTpR{ti)  are 
computed  using  a  lookup  table  (LUT)  scheme.  We  note  the  result  of  RpTpR{ £)  is  always  a 
real  value,  which  causes  RpTpR( £)  =  R*PtPr{ £).  Besides,  RpTpR{£, )  is  an  even  function,  so 
the  size  of  the  LUT  can  be  reduced  to  half  by  only  storing  the  result  of  RpTpR{£, )  where  £ 
is  a  nonnegative  value.  In  our  implementation,  the  LUT  R±  is  a  L>3-entry  LUT  and  store 
the  results  of  RPtPr{ £)  where  £=(0  :  ■  4 Ts).  The  LUT  R2  is  a  copy  of  R\.  If  the 

results  of  \q\Ts  —  rn\  and  | <72 Ts  —  Tn \  are  larger  than  4TS,  the  i?i  and  R2  output  zeros.  If 
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not,  they  are  converted  into  the  proper  read  addresses  of  R±  and  R-2 ■  The  outputs  of  R\ 

and  i?2,  and  corresponding  are  multiplied  together.  In  every  N  BCPs,  the  accumulator 

sums  N  previous  inputs  to  obtain  one  coefficient  of  C/57,  c(qi,q2).  The  C/5/  generator 

module  sequentially  outputs  the  coefficients  of  C/5/  in  a  row-wise  order. 

The  MSR  module  employs  the  EVD  method  to  find  the  matrix  square  root  of 

C/5/  [6].  We  note  C/5/  is  a  symmetric  positive  definite  matrix,  whose  eigenvalues  are 

always  positive.  The  coefficients  of  C/5/  are  stored  in  a  buffer  and  sent  to  the  EVD  module 

which  performs  EVD.  We  employ  the  Jacobi  rotation  algorithm  to  perform  EVD,  since  it  is 

a  well-known  and  accurate  method  for  hardware  implementation,  the  details  of  which  are 

introduced  in  [7].  The  EVD  module  outputs  three  matrices  VcJSJ,  D Cjsii  and  ,  where 

C/5/  =  V 07 s y  D d s /  V C; iS. ,  •  The  coefficients  of  D cISI  sequentially  pass  through  a  square 

1 

root  calculator.  Their  results  are  the  coefficients  of  the  matrix  D7  .  The  coefficients  of 

c  1  si 

I  i_ 

C/5/  =  VcJiSJD£  V^1  are  computed  using  two  matrix  multiplier  modules.  Eventually, 
the  MSR  module  sequentially  outputs  them  in  the  row-wise  order. 

The  CM  module  incorporates  the  inter-tap  and  spatial  correlation  matrices  into 

multiple  uncorrelated  frequency-flat  Rayleigh  fading  waveforms.  It  consists  of  two  subrnod- 

1 

ules:  the  Kronecker  product  (KP)  module,  which  computes  the  Kronecker  product  of  \Ef  f{x , 

II  1 

Stj,  ,  and  C jSI  to  obtain  C^(0),  and  the  vector  multiplier  (VM)  module,  which  irnple- 
3. 

ments  C^(0)  •  &(k).  Our  proposed  simulator  does  not  need  to  store  the  large  size  matrix 

I 

C^(0),  but  employs  the  KP  module  to  compute  it  in  real-time.  The  datapath  of  the  CM 

module  is  shown  in  Fig.  3.  The  RAM  A,  RAM  B,  and  RAM  C  store  the  coefficients  of 

II  I 

x  O)'  T,r.r(T>  x  T5),  and  C fSi(Q  x  Q)  'n  the  row-wise  order.  Several  counters,  mul¬ 
tipliers,  and  adders  work  together  to  generate  the  proper  read  addresses  for  three  RAMs. 
The  clock  periods  of  Counters  1  —  6  are  measured  by  integer  BCPs;  while  their  modulo  is 

related  to  O,  P,  and  Q.  Two  multipliers  are  employed  to  multiply  outputs  of  three  RAMs 

1 

together.  Their  results  are  the  coefficients  of  the  matrix  C^  (0)  in  the  row-wise  order. 

The  VM  module  takes  multiple  uncorrelated  Rayleigh  fading  waveforms  Zci(Rk) 
and  Zsi(Rk)  from  the  FRFG  module,  and  rearranges  their  order  by  using  two  buffers. 
Taking  the  buffer  storing  Zci(Rk)  for  example,  the  buffer  stores  the  sequence:  Zci(Rk), 
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Figure  1.  Hardware  implementation  block  diagram  of  the  triply  selective  fading 
simulator. 


Zc-2(Rk),  ...,  Zc(oPQ){Rk),  repeatedly  outputs  it  (OPQ)  times,  and  then  do  the  same  to 
the  next  sequence:  Zc\(R(k  +  1))  Zc^opQ)(R(k  +  f)).  The  outputs  of  two  buffers  are 
separately  multiplied  by  the  coefficients  of  C^(0).  In  every  (OPQ)  BCPs,  the  accumulator 
sums  the  (OPQ)  previous  inputs  to  obtain  one  single  Hc(l,Rk)  or  Hs(l,Rk).  Therefore,  it 
takes  (OPQ)  BCPs  to  generate  one  single  PIc(l,Rk)  or  Hs(l,Rk),  and  (OPQ)2  BCPs  to 
generate  all  Hc(l,Rk)  or  Hs(l,Rk)  where  l  ranges  from  1  to  (OPQ)  and  k  is  fixed. 


4  EXAMPLES  AND  PERFORMANCE  EVALUATION 


The  discrete-time  MIMO  triply  selective  fading  simulator  was  implemented  on  an 
Altera  Stratix  III  EP3SL150F1152C2N  FPGA  DSP  development  kit.  We  used  Quartus  II 
version  8.0,  DSP  Builder  version  5.0,  and  Matlab  Simulink  for  this  development. 

Memory  usage  evaluation  for  the  KP  module  was  performed.  The  proposed  KP 

i_ 

module  computes  (0)  in  real-time  without  storing.  An  alternative  method  is  the  pre- 

i 

compute  and  store  method  where  the  matrix  (0)  is  pre-computed  by  software  and  stored 
in  hardware  memory  for  further  access.  The  memory  usage  comparison  for  the  two  methods 
is  shown  in  Fig.  4.  The  y-axis  represents  the  memory  usage  measured  in  log,  and  the  x-axis 

i 

represents  the  size  of  C|s/,  Q.  For  the  fixed  O  and  P,  compared  to  the  pre-compute  and 
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Figure  2.  The  datapath  of  the  CfSI  generator  module. 


store  method,  the  KP  method  occupies  much  less  hardware  memory,  which  increases  slowly 
as  Q  increases. 

The  performance  of  the  simulator  was  evaluated  through  a  hardware  implementation 
example  with  specified  parameters.  The  Matlab  simulation  and  theoretical  results  with 
identical  parameters  have  been  reported  by  [1].  We  analyzed  the  statistical  properties  of 
hardware  outputs  and  compared  them  to  the  theoretical  ones.  The  size  of  'S’tx-,  ^ Rx ,  and 
C/57,  were  0=P= 2,  and  Q= 4,  respectively.  The  matrices  ^Tx  and  \I/ rx  were  given  as 
follows: 


* Tx 


Vrx 


1  1.0000  0.2154 
v  0.2154  1.0000 


(  1.0000  -0.3042 

^  -0.3042  1.0000 
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Figure  3.  The  datapath  of  the  CM  module. 


The  PDP  was  an  exponential  function  for  0<rn<5  fj,s.  The  transmit  filter  was  a  linearized 
Gaussian  filter  with  a  time-bandwidth  product  0.3,  and  the  receive  filter  was  an  SRC 
filter  with  a  roll-off  factor  0.3.  Other  implementation  parameters  were:  Fciock=50Mhz, 
Ts= 3.69  ns,  fdTs= O.OOf,  and  the  interpolation  rate  i?=140. 

The  proposed  hardware  simulator  met  the  real-time  requirement  and  output  4.34x  106 
correlated  fading  complex  coefficients  per  second.  When  simulated  by  matlab,  this  fading 
channel  scenario  took  approximate  1  second  to  output  these  coefficients.  All  outputs  have 
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Q 

Figure  4.  The  memory  usage  comparison  of  the  pre-compute  and  store  method  and 
the  proposed  KP  method. 


the  fixed-point  format  Q4.14,  which  is  long  enough  to  provide  high  accuracy  and  avoid  over¬ 
flow.  Based  on  the  hardware  outputs,  the  auto/cross-correlation  between  several  triply  se¬ 
lective  channels  was  computed  and  depicted  in  Fig.  5.  The  matrix  C jsi  with  q\  =  —1, 0, 1,  2 
and  q-2  =  —1, 0, 1,  2  was  computed  and  shown  as 


C/57 


0.0091 

0.0426 

0.0178 

-0.0016 

0.0426 

0.3664 

0.3407 

0.0367 

0.0178 

0.3407 

0.5583 

0.1414 

-0.0016 

0.0367 

0.1414 

0.0602 

Therefore,  three  theoretical  curves  were  0.5583,  0.3407  and  -0.1036  multiplying  by  Jo[2tt fd(ki~ 
£•2)77],  respectively.  As  can  be  seen,  the  correlation  curves  of  hardware  outputs  matched 
them  very  well. 

We  evaluated  hardware  resource  usage  using  a  hardware  implementation  example 
with  0=P=4  and  Q=10.  Hardware  usage  is  summarized  in  Table  1,  where  ALUT  denotes 
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Normalized  Time  Lag:  kfdTg 


Figure  5.  The  auto-correlation  of  /iljl(l,  A:),  the  cross-correlation  between  /iljl(0,fc) 
and  h1A{l,k),  and  cross-correlation  between  hiA(0,k)  and  h-2A(l,k).  The  channel 
index  is  according  to  (2).  The  results  are  based  on  hardware  outputs  of  50  trials  with 
2.8  x  104  samples  in  each  channel  per  trial. 


adaptive  look-up  table,  DLR  is  dedicated  logic  register,  BM  denotes  block  memory,  and 
DSP  means  the  DSP  blocks  (high-speed  18-bit  multipliers).  The  percentage  uses  of  total 
hardware  resources  were  roughly  one  third  for  ALUT,  DLR,  and  BM  of  one  Stratix  III 
FPGA  chip  and  slightly  more  than  a  half  of  the  DSP  multipliers  were  utilized. 


Table  1.  Hardware  usage  of  the  simulator  on  a  Stratix  III  EP3SL150F1152C2N  FPGA 
chip. 


ALUT 

DLR 

BM  bits 

DSP 

1 

C jSI  Generator 

22636 

36586 

1194944 

143 

FRFG 

11988 

231 

618743 

16 

CM 

648 

1111 

10304 

10 

Other 

120 

429 

96098 

25 

Total 

35392 

(31%) 

38357 

(34%) 

1920089 

(34%) 

194 

(51%) 
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5  CONCLUSIONS 


A  hardware  discrete-time  MIMO  triply  selective  Rayleigh  fading  simulator  has  been 
implemented  on  an  Altera  Startix  III  FPGA  DSP  development  kit.  This  simulator  is  capable 
of  simulating  MIMO  triply  selective  fading  channels  with  all  three  types  of  correlations  in 
real-time.  The  outputs  of  the  simulator  are  evaluated  and  proved  to  contain  accurate 
statistical  properties  as  expected. 
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PAPER 

IV.  A  LOW-COMPLEXITY  HARDWARE  IMPLEMENTATION 
OF  DISCRETE-TIME  FREQUENCY-SELECTIVE 
RAYLEIGH  FADING  CHANNELS 

Fei  Ren  and  Yahong  Rosa  Zheng 

Abstract — A  low-complexity  hardware  implementation  method  is  proposed  for  discrete¬ 
time  frequency-selective  Rayleigh  fading  channels.  The  proposed  method  first  employs  the 
Sum-of-Sinusoids  method  to  generate  multiple  independent  flat  fading  channel  responses, 
then  utilizes  a  simple  weight-delay-sum  filtering  method  to  incorporate  the  fractionally- 
delayed  multipath  rays  into  inter-tap  correlated  tap  gains.  It  thus  achieves  accurate  corre¬ 
lation  properties  in  both  inter-tap  correlation  and  temporal  correlation  (or  Doppler  spec¬ 
trum).  The  proposed  method  is  implemented  by  an  Altera  Stratix  II  FPGA  development 
kit  and  the  results  show  excellent  performance  match  with  those  by  MATLAB  software 
simulations. 


1  INTRODUCTION 


Wireless  fading  channel  modeling  and  simulation  provide  a  low-cost  means  for  test¬ 
ing  and  verification  of  transceiver  products,  new  algorithm  design,  and  channel  capacity 
analysis.  A  most  commonly  used  model  is  the  Rayleigh  fading  Wide-Sense  Stationary  Un¬ 
correlated  Scattering  (WSSUS)  channel  which  is  often  simulated  by  one  of  the  two  methods: 
the  Sum-of-Sinusoid  and  the  Doppler  spectrum  filtering  method  [1].  Hardware  and  software 
implementations  of  frequency-flat  fading  channels  have  been  well  studied  and  reported  by, 
for  example,  [1,2, 3, 4]  and  reference  herein.  Software  implementation  of  frequency-selective 
fading  channels  has  also  been  well  investigated  [5,6,7].  However,  hardware  simulation  of 
frequency-selective  fading  channels  still  presents  some  challenge  in  computational  complex¬ 
ity  and  simulation  accuracy  [8,9].  The  most  difficult  aspect  of  frequency-selective  fading 
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channel  simulation  is  to  accurately  compute  and  incorporate  the  cross-correlation  between 
multiple  channel  taps  in  the  discrete-time  model.  Although  the  WSSUS  model  assumes 
multiple  uncorrelated  rays,  the  sampled  discrete-time  channel  taps  are  often  correlated  due 
to  the  bandpass  nature  of  wireless  communications  systems.  Many  current  hardware  imple¬ 
mentations  fail  to  consider  these  correlation  and  result  in  inaccurate  channel  characteristics. 

In  this  paper,  we  propose  a  simple  and  elegant  method  to  incorporate  inter-tap  cor¬ 
relation  for  hardware  implementation  of  discrete-time  frequency-selective  fading  channels. 
The  proposed  method  employs  the  weight-delay-sum  filtering  method  [10]  to  implement  the 
fractional  delays  of  the  multiple  WSSUS  rays.  It  combines  the  weight-delay-sum  method 
with  SoS  flat  fading  simulators  and  ensures  low-complexity  for  real-time  hardware  imple¬ 
mentation.  The  proposed  simulation  method  is  implemented  by  Altera’s  Stratix  II  Field 
Programmable  Gate  Array  (FPGA)  development  kit.  The  results  show  excellent  perfor¬ 
mance  match  with  those  of  MATLAB  software  implementation.  The  proposed  method  has 
advantages  in  low  computational  complexity,  fast  data  rate,  and  more  accurate  waveforms 
and  correlation  properties,  in  comparison  with  existing  hardware  implementation  methods. 


2  DISCRETE-TIME  FREQUENCY-SELECTIVE  FADING 

CHANNEL  MODELS 


The  frequency-selective  Rayleigh  fading  channel  model  is  often  expressed  as  the 
baseband  equivalent  channel  impulse  response  consisting  of  multipath  [1] 

i 

h(r,  t)  Pig[r  -  Tj)  exp[-j(ui(t  -  n)  -  </>*)]  (1) 

i= 1 

where  Pi,uJi,  and  r*  are  the  i-th  multipath  gain,  angular  Doppler  frequency,  and  relative 
delay,  respectively.  The  pulse  shaping  filter  g(r)  is  a  bandpass  filter  often  implemented  by 
a  raised  cosine  filter  [1],  The  multipath  gains  Pi  are  normalized  to  yield  unit  total  power 
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of  the  response.  It  is  commonly  assumed  that  the  multiple  rays  in  (1)  are  Wide-Sense 
stationary  uncorrelated  scattering  (WSSUS). 

When  the  delay  spread  Tmax  —  Tmin  is  much  smaller  than  the  symbol  interval  Tsym, 
the  channel  impulse  response  can  be  assumed  as  frequency-flat  fading 

i 

h(t)  =  y  pi9(t )  exp [~j(uit  -  4>i)j  (2) 

i=  1 

If  sampled  at  Tsym  interval,  the  discrete-time  flat  fading  channel  can  be  efficiently 
simulated  by  several  SoS  models  [1,3]  and  a  typical  one  is 


m 

Zc{k) 

Zs[k) 


Zc(k )  +jZs(k), 


M 

Y,  cos (udk  cos  an  +  </>„), 

n= 1 
M 

I  Zj  ^ 

—  2^  cos (ix)dk  sin  an  +  tpn), 

n= 1 

2mr  —  7 t  +  9 


-,  n  =  1,2,- ••  ,M. 


(3) 


where  ujd  is  the  maximum  angular  Doppler  frequency,  M  is  the  total  number  of  sinusoids, 
and  j  =  \J—  1.  The  angle  of  arrival  an  is  randomized  by  a  uniformly-distributed  9 ,  and  cj)n 
and  ipn  are  the  random  phases  of  the  in-phase  and  quadrature  components,  respectively. 
The  random  variables  </>n,  ipn,  and  6  are  statistically  independent  and  uniformly  distributed 
on  [— 7r,  7r)  for  all  n. 

When  the  channel  coherence  time  is  comparable  to  or  larger  than  the  symbol  in¬ 
terval,  the  fading  channel  is  frequency-selective  and  inter-symbol  interference  often  spans 
multiple  symbol  intervals.  The  sampled  channel  response  (1)  becomes  a  time-varying  FIR 
system 

i 

H(l,k)  =  yPig(lTs-ri)Zi(k ), 

i= 1 


(4) 


81 


where  Zi(k),i  =  !,■■■  are  independent  flat  fading  CIRs  generated  by  (3).  However,  the 
multipath  delays  r*  are  often  fractions  of  the  symbol  interval.  Sampling  the  fractional  delays 
at  Tsym  (or  at  Ts  =  Tsym/U,  where  typically  the  upsampling  rate  U  G  [1,10].)  results  in 
correlated  inter-symbol  delay  taps  [5,6] 

i  i 

E[h{h,k)h\l2,k )]  =  EE  PiPkg(hTs  -  Ti)g\l2Ts  -  rfc),  (5) 

i=  1  fc=l 

note  that  Rgg(i)  =  E[g(r)g^  (t  +  £)]  is  the  autocorrelation  of  the  bandpass  filter  g(r).  The 
resulting  discrete-time  power  delay  profile  is  shown  in  Fig.l. 

Several  methods  have  been  proposed  to  incorporate  the  inter-tap  correlation  in 
frequency-selective  channel  modeling  including  the  spectrum  factorization  method  [7]  and 
the  correlation  matrix  factorization  method  [5,6].  It  has  been  shown  that  these  meth¬ 
ods  yield  accurate  channel  models  with  low  computational  complexity  in  software-based 
simulation.  However,  the  evaluation  of  correlation  coefficients,  and  the  spectrum  and/or 
correlation  matrix  factorization  are  costly  in  hardware  implementation.  Therefore,  we  pro¬ 
pose  a  simple  weight-delay-sum  filtering  method  [10]  to  implement  the  fractional  delays, 

H(l,k)  =  Hc(l,k)+jHs(l,k )  (6) 

i 

Hc(l,k )  =  Y^P^Z^kW-k) 

i=  1 
I 

Hs(l,k )  =  Y^EkiZsi(kM-li) 

i=  1 

where  li  =  |_Tj/TsJ,  and  £/j  =  g(lTs  —  n)  are  Ts-spaced  samples  of  the  delayed  bandpass 
filter,  as  shown  in  Fig.  2,  where  the  raised  cosine  pulse  is  truncated  to  ELgTs  with  Lg  =  3. 

The  simple  weight-delay-sum  method  captures  the  inter-tap  correlation  of  frequency- 
selective  channels  with  very  low  computational  complexity.  The  tradeoff  is  that  it  requires 
I  independent  flat  fading  waveforms  rather  than  L  =  2 Lg  +  1  +  \Tmax/Ts~\  required  in  the 
correlation  matrix  factorization  method  [5].  In  practice,  the  number  of  multipath  I  is  often 
slightly  larger  than  the  total  number  of  taps  L. 
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Figure  1.  (a)  A  typical  urban  channel  PDP  with  multiple  WSSUS  rays,  (b)  Average 
power/tap  of  Ts-spaced  discrete-time  channel  response. 


sym 

Figure  2.  Bandpass  filter  of  the  i-th  ray  sampled  at  Tsym,  where  the  delay  t*  is  a 
fraction  of  Tsym. 
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3  FPGA  IMPLEMENTATION 


For  real-time  hardware  implementation,  frequency-selective  channel  waveforms  must 
be  sampled  at  the  same  rate  as  the  receiver  and  the  received  signal  (after  proper  delay)  is 
then 

L—l 

y{k)  =  ^  H(l,  k)  ■  x{k  —  l)  +  v(k),  (7) 

1=0 

where  x(k)  is  the  transmitted  signal  and  v(k )  is  the  background  white  Gaussian  noise.  If 
the  symbol  interval  Tsym  =  l^s  and  the  upsampling  rate  is  U  =  10,  then  L  samples  of 
H(l,  k)  are  needed  for  every  Ts  =  O.l^s,  where  L  is  on  the  order  of  tens.  This  requirement 
is  stringent  for  sample-by-sample  processing.  However,  in  modern  communications  systems, 
block  transmission  is  often  employed  and  channel  response  is  often  slowly  time  varying.  We 
exploit  this  feather  and  propose  an  efficient  implementation  with  block  processing. 

The  proposed  hardware  implementation  scheme  consists  of  three  major  blocks:  a 
parameter  generator  bank,  a  flat  fading  generator,  and  a  selective  fading  generator  module, 
as  shown  in  Fig.  3,  where  MUX  is  a  multiplexer.  The  parameter  generator  bank  generates 
and  stores  all  random  variables  needed  for  each  of  the  I  WSSUS  rays.  These  include  the  ran¬ 
dom  phase  vectors  <F,:  =  4>2,i,  ■  ■  ■  and  T,;  =  ip2,i,  ■  ■  ,  <fM.il-  the  maximum 

Doppler  frequencies  0Jd,ii  random  phases  0,;,  and  the  power  delay  profile  vectors  P  =  {Pj} 
and  D  =  { taui }.  The  parameter  generator  bank  also  computes  and  stores  the  quantities 
cosanij  and  sin  j  for  all  n  and  i.  The  multiplexer  selects  the  parameters  of  the  i-th  ray 
and  sends  them  to  the  flat  fading  generator  in  series.  The  flat  fading  generator  generates 
the  real  and  imaginary  components  of  the  i-tli  flat  fading  channel  responses  according  to  (3) 
and  outputs  ZCi(k )  and  ZSi(k )  to  two  buffers  of  the  selective  fading  generator.  When  the 
k- th  flat  fading  samples  of  all  I  rays  are  ready  at  the  buffers,  the  selective  fading  generator 
processes  them  with  the  weight-delay-sum  filtering  method  according  to  (6). 
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Figure  3.  Block  diagram  of  FPGA  implementation  of  the  frequency-selective  Rayleigh 
fading  simulator. 


The  implementation  of  the  parameter  generator  bank  is  straightforward  with  several 
uniform  random  number  generators  and  the  sine  and  cosine  functions  are  generated  by  Look 
Up  Tables  (LUT).  The  flat  fading  generator  is  implemented  as  in  Fig.  4,  where  M  cosine 
functions  are  summed  in  series  to  generate  the  real/imaginary  component  of  the  fading 
response.  Flexible  data  formats  are  used  for  different  parameters  according  to  their  fixed- 
point  precision.  For  example,  the  random  phase/Doppler  parameters  use  the  format  (3:20), 
the  number  M  uses  (2:10),  the  time-index  k  uses  (21:30),  and  the  channel  responses  use 
(3:20).  Thus,  accuracy  of  output  can  reach  2”20  ~  10-6. 

The  selective  fading  generator  is  the  core  module  of  the  simulator  and  its  structure 
is  shown  in  Fig.  5.  The  i-th  flat  fading  channel  responses  are  multiplied  with  its  gain 
Pi  according  to  the  PDP  specifications  prior  to  be  stored  in  the  buffers.  The  weights 
Ei  i  =  g{lTs  —  Tj)  are  computed  through  multiple  LUTs  which  store  the  raised  cosine  pulse 
for  t  =  [—LgTsym  :  LgTsym\  at  a  high  resolution.  The  LUTs  takes  the  delay  parameter 
Di  =  Ti  as  the  inputs  and  then  outputs  the  corresponding  weights  Ei  t  to  the  multipliers 
(MUL).  Multiple  MULs  are  used  to  weigh  the  corresponding  flat  fading  rays  in  parallel.  The 
accumulators  implement  the  summation  of  (6)  and  output  a  block  of  Hc(l,k)  and  Hs(l,k ) 
in  parallel. 
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Figure  4.  FPGA  implementation  of  the  flat  fading  generator  module. 


4  IMPLEMENTATION  EXAMPLE  AND  PERFORMANCE 

EVALUATION 


The  proposed  frequency  selective  fading  channel  simulator  was  implemented  by  an 
Altera  Stratix  II  FPGA/DSP  development  kit.  We  used  Quartus  II  version  8.0  and  DSP 
Builder  version  5.0  for  this  development.  DSP  Builder  provides  a  nice  interface  between  the 
FPGA  hardware  and  MATLAB  Simulink  so  that  the  parameters  of  channel  specifications 
were  easily  input  to  the  channel  simulator,  and  the  outputs  of  the  channel  simulator  were 
logged  in  data  files  in  Simulink. 

As  an  example,  results  for  a  typical  urban  channel  model  of  20  WSSUSrays  is 
presented  here.  The  implementation  parameters  were:  the  number  of  sinusoid  M  =  16,  the 
upsampling  rate  U  =  10,  the  output  block  size  is  10  x  1  per  accumulator,  and  the  channel 
length  L  =  60  (in  terms  of  Ts ).  When  the  clock  period  of  the  FPGA  chip  is  set  to  20ns, 
it  meets  the  real-time  requirements  for  symbol  interval  Tsym  =  6A/j,s.  The  logic  utilization 
of  the  single  FPGA  chip  was  33%,  including  15704  (31%)  combinational  ALUTs  and  1383 
(2%)  dedicated  logic  registers.  Total  block  memory  bits  occupied  was  822484  (32%).  The 


Zc.(k) 

i 


Figure  5.  FPGA  implementation  of  frequency-selective  fading  generator  module. 


proposed  low-complexity  hardware  implementation  occupies  less  than  1/3  resources  on  the 
single  FPGA  chip. 

The  performance  of  the  hardware  simulator  was  evaluated  by  its  output  waveforms. 
First,  the  auto-  or  cross-correlation  of  the  flat  fading  generators  ZCi{k )  and  ZSi(k )  are 
computed  by  averaging  over  five  trails  and  each  trial  generated  k  =  2  x  10b  samples.  The 
results  are  shown  in  Fig.  6. 

The  cross-correlation  between  Hc(l,k )  and  Hc(l  +  (1,2, 4),  k)  are  shown  in  Fig.  7. 
When  the  accuracy  of  MATLAB  simulations  is  set  to  1CF6,  which  is  the  same  to  the  accuracy 
of  FPGA  outputs.  All  FPGA  outputs  match  MATLAB  simulations  very  well. 
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Figure  6.  Autocorrelation  and  cross-correlation  of  the  7-th  flat  fading  ray  sampled 
at  at  Tsym  interval.  The  normalized  Doppler  frequency  was  fdTsym  =  0.0008  and 
Sd  =  125  Hz. 
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Figure  7.  Cross-correlation  between  Hc(l,  k )  and  Hc(l  +  1,  k )  of  the  frequency-selective 
channel  simulator. 


5  CONCLUSIONS 


A  low-complexity  FPGA  implementation  of  frequency  selective  Rayleigh  fading 
channels  has  been  proposed,  which  employs  a  simple  weight-delay-sum  processing  to  incor¬ 
porate  the  inter-tap  correlation  of  discrete-time  channel  models.  The  proposed  simulator 
has  been  implemented  on  Altera’s  Startix  II  development  kits.  The  results  of  the  hardware 
simulator  match  those  by  the  software  simulation.  The  advantages  of  the  proposed  simulator 
include  its  flexibility  for  parameter  change  and  its  simple,  compact  implementation. 
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PAPER 

V.  VALIDATION  OF  THE  TRIPLY  SELECTIVE  FADING  CHANNEL 
MODEL  THROUGH  A  MIMO  TEST  BED 
AND  EXPERIMENTAL  RESULTS 

Saurav  Subedi,  Huang  Lou,  Fei  Ren,  Mingxi  Wang,  and  Y.  R.  Zheng 

Abstract — Multiple-input  multiple-output  (MIMO)  channel  is  often  triply  selective,  mean¬ 
ing  that  it  has  spatial,  temporal  and  inter-tap  correlation.  The  temporal  correlation  is  well 
characterized  by  its  Doppler  spectrum,  but  spatial  and  inter-tap  correlation  and  their  im¬ 
pact  on  MIMO  channels  are  less  studied  in  the  literature.  A  MIMO  testbed  has  been 
established  to  measure  the  impulse  response  of  MIMO  channels  and  an  estimation  method 
is  developed  to  quantitatively  measure  the  correlation  matrices  from  experimental  data. 


1  INTRODUCTION 


The  multiple-input  multiple-output  (MIMO)  channel  is  analyzed  as  triply  selec¬ 
tive  fading  channel  in  existing  literatures,  [1],  [2].  This  model  accounts  for  space-selective, 
time-selective  and  frequency-selective  nature  of  MIMO  channels.  It  is  shown  in  [1]  that 
correlation  between  channel  coefficients  of  the  discrete-time  MIMO  channel  can  be  written 
as  a  Kronecker  product  of  temporal  correlation,  inter-tap  correlation  and  spatial  correla¬ 
tions.  It  is  argued  in  [2]  that  this  model  is  not  accurate  and  the  Kronecker  product  for 
the  spatial  correlations,  in  general,  does  not  hold  in  the  case  of  frequency  selective  channel. 
The  underlying  assumptions  in  [1]  are  clarified  and  some  emphatic  conclusions  are  drawn 
in  [3]  to  approve  the  accuracy  of  this  discrete  time  model  for  MIMO  triply  selective  fading 
channels.  A  general  space-time  cross-correlation  function  incorporating  a  wide  range  of 
parameters  of  the  MIMO  fading  channel  is  proposed  in  [4] .  In  [5] ,  the  vector  autoregressive 
(AR)  stochastic  models  are  proposed  to  simulate  multiple  cross-correlated  Rician  fading 
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channels.  The  joint  effect  of  spatial  and  temporal  correlation  is  studied  in  [6]  and  analysis 
of  ergodic  capacity  of  a  MIMO  channel  is  presented  based  on  the  transmit  and  receive  an¬ 
tenna  correlations  matrices. 

This  paper  validates  the  triply  selective  fading  channel  model  through  experimental 
results.  We  verify  the  results  through  the  decomposition  of  the  channel  coefficient  covari¬ 
ance  matrix  into  its  Kronecker  factors.  Approaches  for  decomposition  of  Kronecker  product 
into  its  components  are  suggested  in  [7].  However,  those  methods  are  applicable  only  for 
real  matrices.  In  this  paper,  we  propose  a  method  for  approximating  factors  of  a  Kronecker 
product,  real  or  complex.  Experimental  data  from  a  MIMO  testbed  is  used  to  estimate 
the  channel  impulse  response  (CIR)  and  quantitatively  estimate  the  spatial  and  inter-tap 
correlation  matrices. 

This  paper  is  organized  as  follows.  Section  II  reviews  the  discrete-time  MIMO 
triply  selective  fading  channel  model.  Section  III  describes  about  the  MIMO  testbed  and 
the  experimental  setup.  Section  IV  details  the  channel  estimation  procedure,  the  correlation 
matrices  estimation  procedure,  results  and  analysis.  Finally,  section  V  concludes  the  paper. 


2  DISCRETE-TIME  TRIPLY  SELECTIVE  FADING  MODEL 


The  input-output  relationship  of  the  MIMO  channel  in  discrete-time  is  described 
in  [1]  as 

Q2 

y (k)=  H(M)  -x(fc-  q)  +v(fc)  (1) 

q=-Qi 


where  k  is  the  time  index,  Q\  and  Q2  are  non-negative  integers  representing  the  range  of 
delay  taps  yielding  the  total  channel  length  Q  =  Q1  +  Q2  +  L  x(fc)  =  [xi(k),X2(k),  ....xp(k)]t 
is  the  transmitted  signal  vector,  y (k)  =  [yi(k),  y2{k),  ....yoik)]1  is  the  received  vector  and 
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v(fc)  =  [ui(fc),  v2(k),  ....vo(k)Y  is  the  additive  white  gaussian  noise.  The  superscript  (.)* 
notation  represents  the  matrix  transpose  operator. 


The  MIMO  channel  coefficient  matrix  H(/c,  q)  at  time  instant  k  and  delay  tap  q  is 
defined  by 


H  (k,q)  = 


(  h1A(k,q)  •••  hi)P(k,q )  ^ 


\  h0,i(k,q)  •••  h0,p(k,q)  j 


(2) 


We  reshape  the  matrix  H(fc,  q)  to  an  ( OPQ )  x  1  coefficient  vector  as 


Kec(k)  =  [hi,i(^),..,hiip(A:)  |  ..  | 


(3) 


where  h0)P(A;)  is  the  coefficient  vector  of  the(o,p)-th  sub-channel  given  by  h 0)P(k)  =  [h0tP(—Q i,  k ), 
•  ••>  h0p{^Q2,  /<’)] • 


It  is  stated  in  [1]  that  the  stochastic  fading  channel  coefficient  vector,  h vec(k),  is 
zero-mean  gaussian  distributed  and  its  covariance  matrix,  R  is  given  by 


R  =  E[hvec(ki)  ■  h^ec(k2)j 

=  i'&Rx  <S>  ^ Tx  <8>  VP Tap )  •  To[27r/d(£q  —  k2)Ts\  (4) 

where  (.)H  denotes  the  Hermitian  operator,  <8)  denotes  the  Kronecker  product,  ^  Rx  and 
^Tx  are  the  spatial  correlation  matrices  at  the  receiver  and  transmitter,  respectively,  and 
^ Tap  is  the  intertap  covariance  matrix.  These  matrices  are  defined  in  (5),  (6)  and  (7).  The 
factor  Jo[27r/d(/ci  —  k2)Ts\  describes  temporal  correlation  where  fci  is  the  maximum  doppler 
frequency  and  Ts  is  the  sampling  period,  ,/q ( . )  is  the  zeroth  order  Bessel  function. 
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^ Rx  = 


* Tx  = 


^ Tap  = 


'  PRx{  1,1) 

y  PRx  ( 0 , 1) 
^  PTx(M) 

^  PTx(-P,  1) 


ARx(l,0) 
PRx(0,0 )  j 

PTx(li-P) 
PTx(P,P )  ) 


f  W-Q1-Q1)  •••  V’(  — Q1;Q2) 

V  ^(<92, -Ql)  •••  MQ2,Q2)  , 


(5) 


(6) 


(7) 


where  PRx(m,p )  is  the  receive  correlation  coefficient  between  antennas  m  and  p.  Similarly, 
PTx(n,q)  is  the  transmit  correlation  coefficient  between  n  and  q  transmit  antennas.  The 
elements  of  intertap  covariance  matrix  is  determined  according  to  the  power  delay  profiles. 


This  paper  focuses  on  the  validation  of  the  triply  selective  fading  channel  model  using 
(4)  through  the  estimation  of  the  spatial  correlation  matrices  and  the  intertap  covariance 
matrix. 


3  TESTBED  AND  EXPERIMENT 


A  2x2  MIMO-OFDM  testbed  has  been  developed  using  Altera  Stratix  III  EP3SL150F 
field-programmable  gate  array  (FPGA)  DSP  development  kit.  The  discrete-time  MIMO 
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triply  selective  fading  channel  model  in  [1]  is  the  basis  for  the  design  of  this  testbed.  Hard¬ 
ware  implementation  of  discrete-time  MIMO  triply  selective  fading  channel  emulators  is 
proposed  in  [8]. 

In  the  transmitter  side,  two  independent  data  streams  are  generated  in  Stratix  III 
development  kit.  The  outputs  are  then  up-converted  to  17.5  MHz  of  IF  (Intermediate  Fre¬ 
quency)  and  then  the  signals  are  fed  into  RF  (Radio  Frequency)  block,  RF2-3000UCV1,  to 
be  transmitted  at  915  MHz.  MPA-10-40  is  used  for  power  amplification.  Devices  AFG3252 
and  FS725  are  clock  sources  for  all  other  devices.  The  setup  architecture  of  the  transmitter 
is  shown  in  Fig.  1. 


Figure  1.  Transmitter  setup  architecture. 
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In  the  receiver  side,  RF  signals  are  first  down-converted  to  70  MHz  IF  by  the  down- 
converter  block,  RF200-2500RV1.  Baseband  data  streams  are  then  generated  and  recorded 
in  Stratix  III  development  kit  and  transferred  to  PC.  Devices  AFG3252  and  FS725  provide 
clock  sources.  The  receiver  setup  architecture  is  shown  in  Fig.  2. 


Figure  2.  Receiver  setup  architecture. 


A  bandwidth  configuration  of  3.90  MHz  is  used  in  this  testbed.  The  number  of 
OFDM  subcarriers  is  256  and  a  cyclic  prefix  length  of  64  samples  is  used.  Experiment  has 
been  carried  out  using  three  different  modulation  schemes  viz.  QPSK,  8PSK  and  16QAM. 
These  subcarriers  are  used  for  channel  sounding.  Although  BPSK  is  sufficient  for  channel 
sounding,  the  transceiver  was  originally  designed  for  MIMO  communications,  thus  QPSK, 
8PSK  and  16QAM  modulation  schemes  are  used.  Measurements  are  done  for  two  different 
experimental  setups  -  one  with  both  transmitter  and  receiver  located  in  the  same  room 
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(inside  208)  and  the  other  with  transmitter  and  receiver  located  in  two  different  rooms  (208 
and  212)  across  a  hallway  as  shown  in  the  floor  plan  in  Fig.  3. 
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Figure  3.  Floorplan  used  for  the  experiment. 


Experimental  data  from  this  testbed  is  used  for  channel  estimation  and  subsequent 


analysis. 
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4  PROCEDURE,  RESULTS  AND  ANALYSIS 


4.1  Channel  Estimation 

Time  domain  least  squares  (LS)  method  is  used  for  estimation  of  channel  impulse 
response  (CIR)  for  each  subchannel  of  the  2x2  MIMO  system  based  on  the  known  training 
sequence  and  received  sequence.  The  LS  estimation,  detailed  in  chapter  8  in  [9],  is  obtained 
as 


hLS  =  (Xi/X)-1Xify 


(8) 


where  (.)H  and  represent  the  hermitian  and  inverse  operations  respectively,  X  is  the 
circulant  training  sequence  matrix  and  y  is  the  received  sequence.  The  matrix  X  is  formed 
as 


X  = 


XQ 

■  ■  Xl 

XQ 

^Q+l 

■  ■  X2 

Xl 

XQ+P-1  ' 

■  ■  Xp 

Xp- 

(9) 


where  Q  is  the  number  of  channel  taps  and  P  is  the  number  of  pilot  data  for  each  antenna. 


A  long  probing  sequence  is  transmitted  and  CIRs  are  estimated  progressively  by 
using  cascading  windows  of  size  Np  =  120  symbols.  An  example  of  30-tap  CIRs  of  the 
four  subchannels  is  shown  in  Fig.  4  where  80  cascading  windows  are  used  across  the  length 
of  the  transmitted  data  sequence.  Although  the  signal  bandwidth  is  only  3.9  MHz,  the 
baseband  equivalent  channel  did  experience  multipath  delay  spread  spanning  30  taps.  This 
is  because  both  transmitter  and  receiver  antennas  were  placed  very  low,  only  a  meter  above 
the  floor.  This  is  different  from  the  case  where  one  end  is  placed  very  high  like  a  base 
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station  where  multipath  may  not  be  significant.  This  demonstrates  the  difference  between 
mobile-to-mobile  channel  and  base-station  to  mobile  channel.  Number  of  windows  can  be 
increased  or  overlapping  windows  can  be  used  for  the  estimation  of  CIRs  of  highly  scattering 
channels  such  as  underwater  acoustic  channel. 


Figure  4.  Magnitudes  of  channel  impulse  responses  for  four  subchannels. 


4.2  Estimation  of  the  Channel  Coefficient  Covariance  Matrix 

The  channel  coefficient  covariance  matrix  is  calculated  using  the  estimated  channel 
coefficients.  The  ( OPQ  x  OPQ )  covariance  matrix,  R,  is  calculated  using  (4). Estimated 
channel  coefficient  covariance  matrix  is  shown  in  Fig.  5. 
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Figure  5.  Magnitude  of  estimated  channel  coefficient  covariance  matrix. 


4.3  Decomposition  of  the  Kronecker  Product 

The  Kronecker  product  of  two  matrices  A  and  B  is  defined  as 


C  =  A® B  = 


onB 


®lnB 

ainB  y 


(10) 


where  A  is  (m  x  n),  B  is  (p  x  q)  matrix,  and  C,  the  resultant  Kronecker  product  is  of  size 
(mp  x  nq). 


The  problem  at  hand  is  to  find  estimations  of  A  and  B  from  a  given  Kronecker 
product  C.  Let  us  consider  the  first  block  of  elements  of  matrix  C,  say  Cn  which  is  a 
(p  x  q)  matrix  given  by 


Cn  —  anB 


(11) 
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If  we  calculate  an  ensemble  average  of  all  the  elements  of  Cn,  that  essentially  results 
in  scalar  multiplication  of  an  and  mean  of  all  the  elements  of  B  as  shown  in  12 

E[  Cn]  =  auE[  B]  (12) 

This  isolates  the  first  element  of  A  from  the  Kronecker  product.  We  repeat  the  same  process 
to  obtain  other  elements  of  A.  The  resulting  estimation  of  matrix  A,  therefore,  is  a  scaled 
version  of  actual  A  and  retains  its  spatial  properties. 

In  this  paper,  we  estimate  the  spatial  correlation  matrix  th Trx  from  the  channel 
coefficient  covariance  matrix  R  using  the  method  explained  in  (12). 

4.4  Estimation  of  Intertap  Covariance  Matrix  and  Spatial  Correlation  Matrix 

We  estimate  the  ( Q  x  Q)  intertap  covariance  matrices  for  each  subchannel.  Using 
correlation  matrix  distance  (CMD)  as  a  metric  [10],  we  show  that  these  intertap  covari¬ 
ance  matrices  have  identical  spatial  structure.  CMD,  the  distance  between  two  correlation 
matrices  R1  and  R2  is  defined  as 

dcorr(Ri,  R2)  =  i  -  (13) 

where  f?’{.}  represents  the  trace  of  the  matrix  and  ||.||/  is  the  Frobenius  norm.  CMD  be¬ 
comes  zero  if  the  correlation  matrices  are  equal  up  to  a  scaling  factor  and  one  if  they  differ 
from  each  other.  The  smaller  value,  thus,  verifies  that  the  matrices  are  spatially  identi¬ 
cal. Results  are  summarized  in  Table  1  for  data  obtained  from  different  experimental  setups. 

These  results  comply  with  the  assumption  in  [3]  that  the  power  delay  profile 
of  the  physical  channel  model  is  identical  for  all  transmit  and  receive  antenna  indices. 
We  compute  an  average  intertap  covariance  matrix  and  use  it  as  one  of  the  Kronecker 
factors  of  the  channel  coefficient  covariance  matrix  to  estimate  the  spatial  correlation 
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Table  1.  Comparison  of  correlation  matrices  using  CMD. 


Experi. 

Atten. 

CMD 

CMD 

CMD 

CMD 

CMD 

CMD 

Setup 

(dB) 

mi,  M2 

7711,7721 

7711,7722 

7721,7712 

7721,7722 

Rj  Rverify 

22 

0.0259 

0.0129 

0.0959 

0.0259 

0.0713 

In  208 

26 

0.0255 

0.0169 

0.5789 

0.0255 

0.0744 

30 

0.0159 

0.0192 

0.3133 

0.0159 

0.0647 

2 

0.2550 

0.0822 

0.1003 

0.2550 

0.0908 

0.0942 

208  and 

6 

0.0751 

0.0230 

0.0785 

0.0751 

0.0569 

0.0355 

212 

10 

0.1208 

0.0244 

0.0690 

0.1208 

0.1433 

0.0403 

matrix.  The  spatial  similarity  among  the  intertap  covariance  matrices  of  four  sub¬ 
channels  can  be  observed  in  Fig.  6.  Fig.  7  shows  the  average  intertap  covariance 
matrix,  '&Tap- 


The  elements  of  spatial  correlation  matrix  are  estimated  from  the  channel 
coefficient  covariance  matrix  R.  The  process  in  (12)  gives  a  matrix  spatially  iden¬ 
tical  with  ^r-rx-  We  again  calculate  the  Kronecker  product  of  the  estimated  spatial 
correlation  matrix,  ^Trx  and  the  average  intertap  covariance  matrix,  Tap  using 

R- verify  Trx  ®  ^ Tap  (14) 

to  validate  the  approach  used  for  the  decomposition  of  Kronecker  product.  Since  the 
transmitter-receiver  setup  in  this  experiment  was  static,  the  temporal  correlation  does 
not  have  a  significant  impact  on  the  results.  The  CMD  metric  is  used  to  compare  the 
similarity  between  the  channel  coefficient  covariance  matrix  calculated  using  (11)  and 
(14).  Results  for  6  different  instances  are  shown  in  Table  1.  The  matrices  R ij  are 
the  correlation  matrices  of  the  ij- th  subchannel.  Fig.8  shows  the  channel  coefficient 
covariance  matrix  estimated  using  (14).  the  (4  x  4)  spatial  correlation  matrix, 

is  itself  a  Kronecker  product  of  Rx  and  Ttx- 
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Figure  6.  Magnitudes  of  intertap  covariance  matrices  for  each  subchannel. 


5  CONCLUSIONS 


In  this  paper,  we  validated  the  triply  selective  fading  channel  model  through  a 
MIMO  testbed  and  experimental  results.  Experimental  results  demonstrate  that  the 
discrete-time  triply  selective  fading  channel  can  be  expressed  as  separable  temporal, 
inter-tap  and  spatial  correlations.  Using  correlation  matrix  distance  as  a  metric  we 
show  that  the  intertap  correlations  for  all  the  subchannels  are  spatially  identical.  This 
permits  the  estimation  of  spatial  correlations  matrices  through  the  decomposition  of 
channel  coefficient  covariance  matrix.  Finally,  we  verify  our  results  by  recalculating 
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Figure  7.  Magnitude  of  averaged  intertap  covariance  matrix, \I/Tap. 
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Figure  8.  Kronecker  product  of  estimated  ^Trx  and  'k Tap ■ 


the  Kronecker  product  of  the  estimated  correlation  matrices  and  comparing  the  result 
with  the  covariance  matrix  obtained  directly  from  the  estimated  channel  coefficients. 
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SECTION 

2  CONCLUSIONS 


This  dissertation  focuses  on  research  of  hardware-based  wireless  fading  chan¬ 
nel  emulators.  It  solves  the  following  main  challenges  in  hardware  implementations  of 
wireless  fading  channel  emulators.  The  hardware  implementation  methods  of  triple- 
selective  fading  channel  emulators  with  accurate  correlation  properties  are  proposed. 
On-chip  FRFGs  and  inter-tap  correlation  matrix  generators  are  implemented.  A 
mixed  P-S  computational  structure,  which  incorporates  three  types  of  correlation 
into  sub-channels,  is  proposed  to  make  the  best  tradeoff  between  processing  speed 
and  hardware  usage.  These  proposed  algorithms  and  designs  have  been  simulated 
not  only  by  simulation  tools,  but  also  implemented  and  verified  on  FPGA  develop¬ 
ment  kit  platforms.  The  emulated  channels  reach  excellent  statistical  and  correlation 
properties,  which  match  those  theoretical  ones.  This  dissertation  also  validates  the 
triply  selective  fading  channel  model  through  a  MIMO  testbed  and  experimental  re¬ 
sults.  Experimental  results  demonstrate  that  the  discrete-time  triply  selective  fading 
channel  can  be  expressed  as  separable  temporal,  inter-tap  and  spatial  correlations. 

The  contributions  of  all  my  research  work  during  Ph.D.  study  are  summarized 
in  two  journal  papers  and  five  conference  papers,  among  which,  two  journal  papers  and 
three  conference  papers  are  included  in  this  dissertation.  The  complete  publication 
list  is  included  in  Section  3. 
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